1
|
Chen Z, Ji C, Xu W, Gao J, Huang J, Xu H, Qian G, Huang J. UniAMP: enhancing AMP prediction using deep neural networks with inferred information of peptides. BMC Bioinformatics 2025; 26:10. [PMID: 39799358 PMCID: PMC11725221 DOI: 10.1186/s12859-025-06033-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 01/02/2025] [Indexed: 01/15/2025] Open
Abstract
Antimicrobial peptides (AMPs) have been widely recognized as a promising solution to combat antimicrobial resistance of microorganisms due to the increasing abuse of antibiotics in medicine and agriculture around the globe. In this study, we propose UniAMP, a systematic prediction framework for discovering AMPs. We observe that feature vectors used in various existing studies constructed from peptide information, such as sequence, composition, and structure, can be augmented and even replaced by information inferred by deep learning models. Specifically, we use a feature vector with 2924 values inferred by two deep learning models, UniRep and ProtT5, to demonstrate that such inferred information of peptides suffice for the task, with the help of our proposed deep neural network model composed of fully connected layers and transformer encoders for predicting the antibacterial activity of peptides. Evaluation results demonstrate superior performance of our proposed model on both balanced benchmark datasets and imbalanced test datasets compared with existing studies. Subsequently, we analyze the relations among peptide sequences, manually extracted features, and automatically inferred information by deep learning models, leading to observations that the inferred information is more comprehensive and non-redundant for the task of predicting AMPs. Moreover, this approach alleviates the impact of the scarcity of positive data and demonstrates great potential in future research and applications.
Collapse
Affiliation(s)
- Zixin Chen
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Chengming Ji
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Wenwen Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Jianfeng Gao
- StarHelix Inc, Jiangmiao Road, Nanjing, 210000, Jiangsu, China
| | - Ji Huang
- College of Agriculture, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Huanliang Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Guoliang Qian
- College of Plant Protection, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.
| | - Junxian Huang
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
2
|
Cao R, Li Q, Wei P, Ding Y, Bin Y, Zheng C. IL-6-Inducing Peptide Prediction Based on 3D Structure and Graph Neural Network. Biomolecules 2025; 15:99. [PMID: 39858493 PMCID: PMC11764147 DOI: 10.3390/biom15010099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 12/27/2024] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Interleukin-6 (IL-6) is a potent glycoprotein that plays a crucial role in regulating innate and adaptive immunity, as well as metabolism. The expression and release of IL-6 are closely correlated with the severity of various diseases. IL-6-inducing peptides are critical for the development of immunotherapy and diagnostic biomarkers for some diseases. Most existing methods for predicting IL-6-induced peptides use traditional machine learning methods, whose feature selection is based on prior knowledge. In addition, none of these methods take into account the three-dimensional (3D) structure of peptides, which is essential for their functional properties. In this study, we propose a novel IL-6-inducing peptide prediction method called DGIL-6, which integrates 3D structural information with graph neural networks. DGIL-6 represents a peptide sequence as a graph, where each amino acid is treated as a node, and the adjacency matrix, representing the relationships between nodes, is derived from the predicted residue contact graph of the peptide sequence. In addition to commonly used amino acid representations, such as one-hot encoding and position encoding, the pre-trained model ESM-1b is employed to extract amino acid features as node features. In order to simultaneously consider node weights and information updates, a dual-channel method combining Graph Attention Network (GAT) and Graph Convolutional Network (GCN) is adopted. Finally, the extracted features from both channels are merged for the classification of IL-6-inducing peptides. A series of experiments including cross-validation, independent testing, ablation studies, and visualizations demonstrate the effectiveness of the DGIL-6 method.
Collapse
Affiliation(s)
- Ruifen Cao
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (Q.L.)
| | - Qiangsheng Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (Q.L.)
| | - Pijing Wei
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Yun Ding
- School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| | - Yannan Bin
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| |
Collapse
|
3
|
Yao L, Guan J, Xie P, Chung CR, Zhao Z, Dong D, Guo Y, Zhang W, Deng J, Pang Y, Liu Y, Peng Y, Horng JT, Chiang YC, Lee TY. dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era. Nucleic Acids Res 2025; 53:D364-D376. [PMID: 39540425 PMCID: PMC11701527 DOI: 10.1093/nar/gkae1019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/12/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024] Open
Abstract
Antimicrobial resistance is one of the most urgent global health threats, especially in the post-pandemic era. Antimicrobial peptides (AMPs) offer a promising alternative to traditional antibiotics, driving growing interest in recent years. dbAMP is a comprehensive database offering extensive annotations on AMPs, including sequence information, functional activity data, physicochemical properties and structural annotations. In this update, dbAMP has curated data from over 5200 publications, encompassing 33,065 AMPs and 2453 antimicrobial proteins from 3534 organisms. Additionally, dbAMP utilizes ESMFold to determine the three-dimensional structures of AMPs, providing over 30,000 structural annotations that facilitate structure-based functional insights for clinical drug development. Furthermore, dbAMP employs molecular docking techniques, providing over 100 docked complexes that contribute useful insights into the potential mechanisms of AMPs. The toxicity and stability of AMPs are critical factors in assessing their potential as clinical drugs. The updated dbAMP introduced an efficient tool for evaluating the hemolytic toxicity and half-life of AMPs, alongside an AMP optimization platform for designing AMPs with high antimicrobial activity, reduced toxicity and increased stability. The updated dbAMP is freely accessible at https://awi.cuhk.edu.cn/dbAMP/. Overall, dbAMP represents a comprehensive and essential resource for AMP analysis and design, poised to advance antimicrobial strategies in the post-pandemic era.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317, Taoyuan, Taiwan
| | - Zhihao Zhao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Danhong Dong
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Yilin Guo
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Wenyang Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Yuxuan Pang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, 108-8639, Tokyo, Japan
| | - Yulan Liu
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Yunlu Peng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, 320317, Taoyuan, Taiwan
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172, Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093, Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093, Hsinchu, Taiwan
| |
Collapse
|
4
|
Pandey P, Srivastava A. sAMP-VGG16: Force-field assisted image-based deep neural network prediction model for short antimicrobial peptides. Proteins 2025; 93:372-383. [PMID: 38520179 DOI: 10.1002/prot.26681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/15/2024] [Accepted: 02/28/2024] [Indexed: 03/25/2024]
Abstract
During the last three decades, antimicrobial peptides (AMPs) have emerged as a promising therapeutic alternative to antibiotics. The approaches for designing AMPs span from experimental trial-and-error methods to synthetic hybrid peptide libraries. To overcome the exceedingly expensive and time-consuming process of designing effective AMPs, many computational and machine-learning tools for AMP prediction have been recently developed. In general, to encode the peptide sequences, featurization relies on approaches based on (a) amino acid (AA) composition, (b) physicochemical properties, (c) sequence similarity, and (d) structural properties. In this work, we present an image-based deep neural network model to predict AMPs, where we are using feature encoding based on Drude polarizable force-field atom types, which can capture the peptide properties more efficiently compared to conventional feature vectors. The proposed prediction model identifies short AMPs (≤30 AA) with promising accuracy and efficiency and can be used as a next-generation screening method for predicting new AMPs. The source code is publicly available at the Figshare server sAMP-VGG16.
Collapse
Affiliation(s)
- Poonam Pandey
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| |
Collapse
|
5
|
Henson BAB, Li F, Álvarez-Huerta JA, Wedamulla PG, Palacios AV, Scott MRM, Lim DTE, Scott WMH, Villanueva MTL, Ye E, Straus SK. Novel active Trp- and Arg-rich antimicrobial peptides with high solubility and low red blood cell toxicity designed using machine learning tools. Int J Antimicrob Agents 2025; 65:107399. [PMID: 39645171 DOI: 10.1016/j.ijantimicag.2024.107399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/07/2024] [Accepted: 11/29/2024] [Indexed: 12/09/2024]
Abstract
BACKGROUND Given the rising number of multidrug-resistant (MDR) bacteria, there is a need to design synthetic antimicrobial peptides (AMPs) that are highly active, non-hemolytic, and highly soluble. Machine learning tools allow the straightforward in silico identification of non-hemolytic antimicrobial peptides. METHODS Here, we utilized a number of these tools to rank the best peptides from two libraries comprised of: 1) a total of 8192 peptides with sequence bhxxbhbGAL, where b is the basic amino acid R or K, h is a hydrophobic amino acid, i.e. G, A, L, F, I, V, Y, or W and x is Q, S, A, or V; and 2) a total of 512 peptides with sequence RWhxbhRGWL, where b and h are as for the first library and x is Q, S, A, or G. The top 100 sequences from each library, as well as 10 peptides predicted to be active but hemolytic (for a total of 220 peptides), were SPOT synthesized and their IC50 values were determined against S. aureus USA 300 (MRSA). RESULTS Of these, 6 AMPs with low IC50's were characterized further in terms of: MICs against MRSA, E. faecalis, K. pneumoniae, E.coli and P. aeruginosa; RBC lysis; secondary structure in mammalian and bacterial model membranes; and activity against cancer cell lines HepG2, CHO, and PC-3. CONCLUSION Overall, the approach yielded a large family of active antimicrobial peptides with high solubility and low red blood cell toxicity. It also provides a framework for future designs and improved machine learning tools.
Collapse
Affiliation(s)
- Bridget A B Henson
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Fucong Li
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Poornima G Wedamulla
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Arianna Valdes Palacios
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Max R M Scott
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - David Thiam En Lim
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - W M Hayden Scott
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Monica T L Villanueva
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Emily Ye
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Suzana K Straus
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
6
|
Brizuela CA, Liu G, Stokes JM, de la Fuente‐Nunez C. AI Methods for Antimicrobial Peptides: Progress and Challenges. Microb Biotechnol 2025; 18:e70072. [PMID: 39754551 PMCID: PMC11702388 DOI: 10.1111/1751-7915.70072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/06/2025] Open
Abstract
Antimicrobial peptides (AMPs) are promising candidates to combat multidrug-resistant pathogens. However, the high cost of extensive wet-lab screening has made AI methods for identifying and designing AMPs increasingly important, with machine learning (ML) techniques playing a crucial role. AI approaches have recently revolutionised this field by accelerating the discovery of new peptides with anti-infective activity, particularly in preclinical mouse models. Initially, classical ML approaches dominated the field, but recently there has been a shift towards deep learning (DL) models. Despite significant contributions, existing reviews have not thoroughly explored the potential of large language models (LLMs), graph neural networks (GNNs) and structure-guided AMP discovery and design. This review aims to fill that gap by providing a comprehensive overview of the latest advancements, challenges and opportunities in using AI methods, with a particular emphasis on LLMs, GNNs and structure-guided design. We discuss the limitations of current approaches and highlight the most relevant topics to address in the coming years for AMP discovery and design.
Collapse
Affiliation(s)
| | - Gary Liu
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic DiscoveryMcMaster UniversityHamiltonOntarioCanada
| | - Jonathan M. Stokes
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic DiscoveryMcMaster UniversityHamiltonOntarioCanada
| | - Cesar de la Fuente‐Nunez
- Machine Biology Group, Department of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied ScienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Chemistry, School of Arts and SciencesUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Penn Institute for Computational ScienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
7
|
Chen S, Qi H, Zhu X, Liu T, Fan Y, Su Q, Gong Q, Jia C, Liu T. Screening and identification of antimicrobial peptides from the gut microbiome of cockroach Blattella germanica. MICROBIOME 2024; 12:272. [PMID: 39709489 DOI: 10.1186/s40168-024-01985-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 11/21/2024] [Indexed: 12/23/2024]
Abstract
BACKGROUND The overuse of antibiotics has led to lethal multi-antibiotic-resistant microorganisms around the globe, with restricted availability of novel antibiotics. Compared to conventional antibiotics, evolutionarily originated antimicrobial peptides (AMPs) are promising alternatives to address these issues. The gut microbiome of Blattella germanica represents a previously untapped resource of naturally evolving AMPs for developing antimicrobial agents. RESULTS Using the in-house designed tool "AMPidentifier," AMP candidates were mined from the gut microbiome of B. germanica, and their activities were validated both in vitro and in vivo. Among filtered candidates, AMP1, derived from the symbiotic microorganism Blattabacterium cuenoti, demonstrated broad-spectrum antibacterial activity, low cytotoxicity towards mammalian cells, and a lack of hemolytic effects. Mechanistic studies revealed that AMP1 rapidly permeates the bacterial cell and accumulates intracellularly, resulting in a gradual and mild depolarization of the cell membrane during the initial incubation period, suggesting minimal direct impact on membrane integrity. Furthermore, observations from fluorescence microscopy and scanning electron microscopy indicated abnormalities in bacterial binary fission and compromised cell structure. These findings led to the hypothesis that AMP1 may inhibit bacterial cell wall synthesis. Furthermore, AMP1 showed potent antibacterial and wound healing effects in mice, with comparable performances of vancomycin. CONCLUSIONS This study exemplifies an interdisciplinary approach to screening safe and effective AMPs from natural biological tissues, and our identified AMP 1 holds promising potential for clinical application.
Collapse
Affiliation(s)
- Sizhe Chen
- MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
- Microbiota I-Center (MagIC), Hong Kong SAR, China
- The Department of Medicine & Therapeutics, The Chinese University of Hong Kong, ShatinHong Kong SAR, NT, China
| | - Huitang Qi
- MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Xingzhuo Zhu
- Department of Thoracic Surgery, The First Affiliated Hospital of Xiaan Jiaotong University, Xian, 710061, China
| | - Tianxiang Liu
- School of Science, Dalian Maritime University, Dalian, 116026, China
| | - Yuting Fan
- MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Qi Su
- Microbiota I-Center (MagIC), Hong Kong SAR, China
- The Department of Medicine & Therapeutics, The Chinese University of Hong Kong, ShatinHong Kong SAR, NT, China
| | - Qiuyu Gong
- Department of Thoracic Surgery, The First Affiliated Hospital of Xiaan Jiaotong University, Xian, 710061, China.
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, 116026, China.
| | - Tian Liu
- MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China.
| |
Collapse
|
8
|
Bello-Madruga R, Torrent Burgas M. The limits of prediction: Why intrinsically disordered regions challenge our understanding of antimicrobial peptides. Comput Struct Biotechnol J 2024; 23:972-981. [PMID: 38404711 PMCID: PMC10884422 DOI: 10.1016/j.csbj.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/10/2024] [Accepted: 02/10/2024] [Indexed: 02/27/2024] Open
Abstract
Antimicrobial peptides (AMPs) are molecules found in most organisms, playing a vital role in innate immune defense against pathogens. Their mechanism of action involves the disruption of bacterial cell membranes, causing leakage of cellular contents and ultimately leading to cell death. While AMPs typically lack a defined structure in solution, they often assume a defined conformation when interacting with bacterial membranes. Given this structural flexibility, we investigated whether intrinsically disordered regions (IDRs) with AMP-like properties could exhibit antimicrobial activity. We tested 14 peptides from different IDRs predicted to have antimicrobial activity and found that nearly all of them did not display the anticipated effects. These peptides failed to adopt a defined secondary structure and had compromised membrane interactions, resulting in a lack of antimicrobial activity. We hypothesize that evolutionary constraints may prevent IDRs from folding, even in membrane-like environments, limiting their antimicrobial potential. Moreover, our research reveals that current antimicrobial predictors fail to accurately capture the structural features of peptides when dealing with intrinsically unstructured sequences. Hence, the results presented here may have far-reaching implications for designing and improving antimicrobial strategies and therapies against infectious diseases.
Collapse
Affiliation(s)
- Roberto Bello-Madruga
- The Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain
| | - Marc Torrent Burgas
- The Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain
| |
Collapse
|
9
|
Zhong G, Liu H, Deng L. Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification. Interdiscip Sci 2024; 16:951-965. [PMID: 38972032 DOI: 10.1007/s12539-024-00640-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 06/04/2024] [Accepted: 06/07/2024] [Indexed: 07/08/2024]
Abstract
The emergence of antibiotic-resistant microbes raises a pressing demand for novel alternative treatments. One promising alternative is the antimicrobial peptides (AMPs), a class of innate immunity mediators within the therapeutic peptide realm. AMPs offer salient advantages such as high specificity, cost-effective synthesis, and reduced toxicity. Although some computational methodologies have been proposed to identify potential AMPs with the rapid development of artificial intelligence techniques, there is still ample room to improve their performance. This study proposes a predictive framework which ensembles deep learning and statistical learning methods to screen peptides with antimicrobial activity. We integrate multiple LightGBM classifiers and convolution neural networks which leverages various predicted sequential, structural and physicochemical properties from their residue sequences extracted by diverse machine learning paradigms. Comparative experiments exhibit that our method outperforms other state-of-the-art approaches on an independent test dataset, in terms of representative capability measures. Besides, we analyse the discrimination quality under different varieties of attribute information and it reveals that combination of multiple features could improve prediction. In addition, a case study is carried out to illustrate the exemplary favorable identification effect. We establish a web application at http://amp.denglab.org to provide convenient usage of our proposal and make the predictive framework, source code, and datasets publicly accessible at https://github.com/researchprotein/amp .
Collapse
Affiliation(s)
- Guolun Zhong
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Hui Liu
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211816, China.
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
10
|
Liu X, Luo J, Wang X, Zhang Y, Chen J. Directed evolution of antimicrobial peptides using multi-objective zeroth-order optimization. Brief Bioinform 2024; 26:bbae715. [PMID: 39800873 PMCID: PMC11725395 DOI: 10.1093/bib/bbae715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/08/2024] [Accepted: 12/27/2024] [Indexed: 01/16/2025] Open
Abstract
Antimicrobial peptides (AMPs) emerge as a type of promising therapeutic compounds that exhibit broad spectrum antimicrobial activity with high specificity and good tolerability. Natural AMPs usually need further rational design for improving antimicrobial activity and decreasing toxicity to human cells. Although several algorithms have been developed to optimize AMPs with desired properties, they explored the variations of AMPs in a discrete amino acid sequence space, usually suffering from low efficiency, lack diversity, and local optimum. In this work, we propose a novel directed evolution method, named PepZOO, for optimizing multi-properties of AMPs in a continuous representation space guided by multi-objective zeroth-order optimization. PepZOO projects AMPs from a discrete amino acid sequence space into continuous latent representation space by a variational autoencoder. Subsequently, the latent embeddings of prototype AMPs are taken as start points and iteratively updated according to the guidance of multi-objective zeroth-order optimization. Experimental results demonstrate PepZOO outperforms state-of-the-art methods on improving the multi-properties in terms of antimicrobial function, activity, toxicity, and binding affinity to the targets. Molecular docking and molecular dynamics simulations are further employed to validate the effectiveness of our method. Moreover, PepZOO can reveal important motifs which are required to maintain a particular property during the evolution by aligning the evolutionary sequences. PepZOO provides a novel research paradigm that optimizes AMPs by exploring property change instead of exploring sequence mutations, accelerating the discovery of potential therapeutic peptides.
Collapse
Affiliation(s)
- Xianliang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Jiawei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Xinyan Wang
- Core Research Facility, Southern University of Science and Technology, No. 1088 Xueyuan Road, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| |
Collapse
|
11
|
Ahmed Z, Shahzadi K, Jin Y, Li R, Momanyi BM, Zulfiqar H, Ning L, Lin H. Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy. Proteomics 2024; 24:e2400044. [PMID: 38824664 DOI: 10.1002/pmic.202400044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024]
Abstract
RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir Bagh, Bagh, Azad Kashmir, Pakistan
| | - Yanting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
12
|
Slavokhotova AA, Shelenkov AA, Rogozhin EA. Computational Prediction and Structural Analysis of α-Hairpinins, a Ubiquitous Family of Antimicrobial Peptides, Using the Cysmotif Searcher Pipeline. Antibiotics (Basel) 2024; 13:1019. [PMID: 39596714 PMCID: PMC11591084 DOI: 10.3390/antibiotics13111019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/25/2024] [Accepted: 10/26/2024] [Indexed: 11/29/2024] Open
Abstract
BACKGROUND α-Hairpinins are a family of antimicrobial peptides, promising antimicrobial agents, which includes only 12 currently revealed members with proven activity, although their real number is supposed to be much higher. α-Hairpinins are short peptides containing four cysteine residues arranged in a specific Cys-motif. These antimicrobial peptides (AMPs) have a characteristic helix-loop-helix structure with two disulfide bonds. Isolation of α-hairpinins by biochemical methods is cost- and labor-consuming, thus requiring reliable preliminary in silico prediction. METHODS In this study, we developed a special algorithm for the prediction of putative α-hairpinins on the basis of characteristic motifs with four (4C) and six (6C) cysteines deduced from translated plant transcriptome sequences. We integrated this algorithm into the Cysmotif searcher pipeline and then analyzed all transcriptomes available from the One Thousand Plant Transcriptomes project. RESULTS We predicted more than 2000 putative α-hairpinins belonging to various plant sources including algae, mosses, ferns, and true flowering plants. These data make α-hairpinins one of the ubiquitous antimicrobial peptides, being widespread among various plants. The largest numbers of α-hairpinins were revealed in the Papaveraceae family and in Papaver somniferum in particular. CONCLUSIONS By analyzing the primary structure of α-hairpinins, we concluded that more predicted peptides with the 6C motif are likely to have potent antimicrobial activity in comparison to the ones possessing 4C motifs. In addition, we found 30 α-hairpinin precursors containing from two to eight Cys-rich modules. A striking similarity between some α-hairpinin modules belonging to diverse plants was revealed. These data allowed us to assume that the evolution of α-hairpinin precursors possibly involved changing the number of Cys-rich modules, leading to some missing middle and C-terminal modules, in particular.
Collapse
Affiliation(s)
- Anna A. Slavokhotova
- Central Research Institute of Epidemiology, Novogireevskaya Str., 3a, 111123 Moscow, Russia
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry RAS, Miklukho-Maklaya Str., 16/10, 117437 Moscow, Russia;
| | - Andrey A. Shelenkov
- Central Research Institute of Epidemiology, Novogireevskaya Str., 3a, 111123 Moscow, Russia
| | - Eugene A. Rogozhin
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry RAS, Miklukho-Maklaya Str., 16/10, 117437 Moscow, Russia;
- All-Russian Institute for Plant Protection, Podbelskogo Str., 196608 Saint-Petersburg-Pushkin, Russia
| |
Collapse
|
13
|
Gao W, Zhao J, Gui J, Wang Z, Chen J, Yue Z. Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides. J Chem Inf Model 2024; 64:7772-7785. [PMID: 39316765 DOI: 10.1021/acs.jcim.4c00507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
Collapse
Affiliation(s)
- Wanling Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jun Zhao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jianfeng Gui
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zehan Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jie Chen
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
14
|
Panchalingam S, Kasivelu G. A computational approach to identifying peptide inhibitors againstWhite Spot Syndrome Virus: Targeting the virus envelope protein. Microb Pathog 2024; 195:106849. [PMID: 39147215 DOI: 10.1016/j.micpath.2024.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 08/12/2024] [Accepted: 08/12/2024] [Indexed: 08/17/2024]
Abstract
The white spot syndrome virus (WSSV), a rapidly replicating and highly lethal pathogen that targets Penaeid shrimp, has emerged as one of the most widespread viruses globally due to its high virulence. With effective chemotherapeutics still unavailable, the pursuit of novel and viable strategies against WSSV remains a crucial focus in the field of shrimp farming. The envelope proteins of WSSV are essential for virus entry, serving as excellent targets for the development of antiviral therapeutics. Novel strategies in the design of inhibitory peptides, especially those targeting envelope protein (VP28) located on the surface of the virus particle, play a critical role as a significant virulence factor during the early stages of inherent WSSV infection in shrimp. In this direction, the current computational study focused on identifying self-inhibitory peptides from the hydrophobic membrane regions of the VP28 protein, employing peptide docking and molecular dynamics simulation (MDS) approaches. Such inhibitory peptides could be useful building blocks for the rational engineering of inhibitory therapeutics since they imitate the mechanism of binding to homologous partners used by their origin domain to interact with other molecules. The N-terminal sequence of VP28 has been reported as the potential site for membrane interactions during the virus entry. Moreover, drug delivery systems mediated by chitosan and gold nanoparticles are being developed to enhance the therapeutic efficacy of anti-viral peptides. These systems can increase the solubility, stability, and selectivity of peptides, possessing better qualities than conventional delivery methods. This computational study on self-inhibitory peptides could be a valuable resource for further in vitro and in vivo studies on anti-viral therapeutics in the aquaculture industry.
Collapse
Affiliation(s)
- Santhiya Panchalingam
- Centre for Ocean Research, Sathyabama Institute of Science and Technology, Chennai, 600 119, India
| | - Govindaraju Kasivelu
- Centre for Ocean Research, Sathyabama Institute of Science and Technology, Chennai, 600 119, India.
| |
Collapse
|
15
|
Wei L. Advanced deep learning approaches enable high-throughput biological and biomedicine data analysis. Methods 2024; 230:116-118. [PMID: 39154807 DOI: 10.1016/j.ymeth.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024] Open
Affiliation(s)
- Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao; School of Informatics, Xiamen University, Xiamen, China.
| |
Collapse
|
16
|
Qiao B, Wang S, Hou M, Chen H, Zhou Z, Xie X, Pang S, Yang C, Yang F, Zou Q, Sun S. Identifying nucleotide-binding leucine-rich repeat receptor and pathogen effector pairing using transfer-learning and bilinear attention network. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae581. [PMID: 39331576 DOI: 10.1093/bioinformatics/btae581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 08/24/2024] [Accepted: 09/25/2024] [Indexed: 09/29/2024]
Abstract
MOTIVATION Nucleotide-binding leucine-rich repeat (NLR) family is a class of immune receptors capable of detecting and defending against pathogen invasion. They have been widely used in crop breeding. Notably, the correspondence between NLRs and effectors (CNE) determines the applicability and effectiveness of NLRs. Unfortunately, CNE data is very scarce. In fact, we've found a substantial 91 291 NLRs confirmed via wet experiments and bioinformatics methods but only 387 CNEs are recognized, which greatly restricts the potential application of NLRs. RESULTS We propose a deep learning algorithm called ProNEP to identify NLR-effector pairs in a high-throughput manner. Specifically, we conceptualized the CNE prediction task as a protein-protein interaction (PPI) prediction task. Then, ProNEP predicts the interaction between NLRs and effectors by combining the transfer learning with a bilinear attention network. ProNEP achieves superior performance against state-of-the-art models designed for PPI predictions. Based on ProNEP, we conduct extensive identification of potential CNEs for 91 291 NLRs. With the rapid accumulation of genomic data, we expect that this tool will be widely used to predict CNEs in new species, advancing biology, immunology, and breeding. AVAILABILITY AND IMPLEMENTATION The ProNEP is available at http://nerrd.cn/#/prediction. The project code is available at https://github.com/QiaoYJYJ/ProNEP.
Collapse
Affiliation(s)
- Baixue Qiao
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China
| | - Shuda Wang
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China
| | - Mingjun Hou
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
| | - Haodi Chen
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
| | - Zhengwenyang Zhou
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
| | - Xueying Xie
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
| | - Shaozi Pang
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
| | - Chunxue Yang
- College of Landscape Architecture, Northeast Forestry University, Harbin 150001, China
| | - Fenglong Yang
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou 350122, China
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Shanwen Sun
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China
| |
Collapse
|
17
|
Momanyi BM, Temesgen SA, Wang T, Gao H, Gao R, Tang H, Tang L. iGATTLDA: Integrative graph attention and transformer-based model for predicting lncRNA-Disease associations. IET Syst Biol 2024; 18:172-182. [PMID: 39308027 PMCID: PMC11490194 DOI: 10.1049/syb2.12098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 08/19/2024] [Accepted: 09/10/2024] [Indexed: 10/20/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) have emerged as significant contributors to the regulation of various biological processes, and their dysregulation has been linked to a variety of human disorders. Accurate prediction of potential correlations between lncRNAs and diseases is crucial for advancing disease diagnostics and treatment procedures. The authors introduced a novel computational method, iGATTLDA, for the prediction of lncRNA-disease associations. The model utilised lncRNA and disease similarity matrices, with known associations represented in an adjacency matrix. A heterogeneous network was constructed, dissecting lncRNAs and diseases as nodes and their associations as edges. The Graph Attention Network (GAT) is employed to process initial features and corresponding adjacency information. GAT identified significant neighbouring nodes in the network, capturing intricate relationships between lncRNAs and diseases, and generating new feature representations. Subsequently, the transformer captures global dependencies and interactions across the entire sequence of features produced by the GAT. Consequently, iGATTLDA successfully captures complex relationships and interactions that conventional approaches may overlook. In evaluating iGATTLDA, it attained an area under the receiver operating characteristic (ROC) curve (AUC) of 0.95 and an area under the precision recall curve (AUPRC) of 0.96 with a two-layer multilayer perceptron (MLP) classifier. These results were notably higher compared to the majority of previously proposed models, further substantiating the model's efficiency in predicting potential lncRNA-disease associations by incorporating both local and global interactions. The implementation details can be obtained from https://github.com/momanyibiffon/iGATTLDA.
Collapse
Affiliation(s)
- Biffon Manyura Momanyi
- School of Computer Science and EngineeringCenter for Informational BiologyUniversity of Electronic Science and Technology of ChinaChengduChina
| | - Sebu Aboma Temesgen
- School of Life Science and TechnologyCenter for Informational BiologyUniversity of Electronic Science and Technology of ChinaChengduChina
| | - Tian‐Yu Wang
- School of Life Science and TechnologyCenter for Informational BiologyUniversity of Electronic Science and Technology of ChinaChengduChina
| | - Hui Gao
- School of Computer Science and EngineeringCenter for Informational BiologyUniversity of Electronic Science and Technology of ChinaChengduChina
| | - Ru Gao
- The People’s Hospital of WenjiangChengduChina
| | - Hua Tang
- School of Basic Medical SciencesSouthwest Medical UniversityLuzhouChina
- Medical Engineering & Medical Informatics Integration and Transformational Medicine Key Laboratory of Luzhou CityLuzhouChina
- Central Nervous System Drug Key Laboratory of Sichuan ProvinceLuzhouChina
| | - Li‐Xia Tang
- School of Life Science and TechnologyCenter for Informational BiologyUniversity of Electronic Science and Technology of ChinaChengduChina
| |
Collapse
|
18
|
Ahmed Z, Shahzadi K, Temesgen SA, Ahmad B, Chen X, Ning L, Zulfiqar H, Lin H, Jin YT. A protein pre-trained model-based approach for the identification of the liquid-liquid phase separation (LLPS) proteins. Int J Biol Macromol 2024; 277:134146. [PMID: 39067723 DOI: 10.1016/j.ijbiomac.2024.134146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 07/06/2024] [Accepted: 07/23/2024] [Indexed: 07/30/2024]
Abstract
Liquid-liquid phase separation (LLPS) regulates many biological processes including RNA metabolism, chromatin rearrangement, and signal transduction. Aberrant LLPS potentially leads to serious diseases. Therefore, the identification of the LLPS proteins is crucial. Traditionally, biochemistry-based methods for identifying LLPS proteins are costly, time-consuming, and laborious. In contrast, artificial intelligence-based approaches are fast and cost-effective and can be a better alternative to biochemistry-based methods. Previous research methods employed word2vec in conjunction with machine learning or deep learning algorithms. Although word2vec captures word semantics and relationships, it might not be effective in capturing features relevant to protein classification, like physicochemical properties, evolutionary relationships, or structural features. Additionally, other studies often focused on a limited set of features for model training, including planar π contact frequency, pi-pi, and β-pairing propensities. To overcome such shortcomings, this study first constructed a reliable dataset containing 1206 protein sequences, including 603 LLPS and 603 non-LLPS protein sequences. Then a computational model was proposed to efficiently identify the LLPS proteins by perceiving semantic information of protein sequences directly; using an ESM2-36 pre-trained model based on transformer architecture in conjunction with a convolutional neural network. The model could achieve an accuracy of 85.68% and 89.67%, respectively on training data and test data, surpassing the accuracy of previous studies. The performance demonstrates the potential of our computational methods as efficient alternatives for identifying LLPS proteins.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir, Bagh, Azad Kashmir, Pakistan.
| | - Sebu Aboma Temesgen
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Basharat Ahmad
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Xiang Chen
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China; School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China.
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| |
Collapse
|
19
|
Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024; 25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Therapeutic peptides are therapeutic agents synthesized from natural amino acids, which can be used as carriers for precisely transporting drugs and can activate the immune system for preventing and treating various diseases. However, screening therapeutic peptides using biochemical assays is expensive, time-consuming, and limited by experimental conditions and biological samples, and there may be ethical considerations in the clinical stage. In contrast, screening therapeutic peptides using machine learning and computational methods is efficient, automated, and can accurately predict potential therapeutic peptides. In this study, a k-nearest neighbor model based on multi-Laplacian and kernel risk sensitive loss was proposed, which introduces a kernel risk loss function derived from the K-local hyperplane distance nearest neighbor model as well as combining the Laplacian regularization method to predict therapeutic peptides. The findings indicated that the suggested approach achieved satisfactory results and could effectively predict therapeutic peptide sequences.
Collapse
Affiliation(s)
- Wenyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006 Xiyuan Avenue, High tech Zone, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Leyi Wei
- Macao Polytechnic University, Gomes Street, Macau Peninsula, Macau 999078, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, No. 71 Xinmin Street, Chaoyang District, Changchun 130021, China
| |
Collapse
|
20
|
Zhou Y, Zhou S, Bi Y, Zou Q, Jia C. A two-task predictor for discovering phase separation proteins and their undergoing mechanism. Brief Bioinform 2024; 25:bbae528. [PMID: 39434494 PMCID: PMC11492799 DOI: 10.1093/bib/bbae528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 09/12/2024] [Accepted: 10/17/2024] [Indexed: 10/23/2024] Open
Abstract
Liquid-liquid phase separation (LLPS) is one of the mechanisms mediating the compartmentalization of macromolecules (proteins and nucleic acids) in cells, forming biomolecular condensates or membraneless organelles. Consequently, the systematic identification of potential LLPS proteins is crucial for understanding the phase separation process and its biological mechanisms. A two-task predictor, Opt_PredLLPS, was developed to discover potential phase separation proteins and further evaluate their mechanism. The first task model of Opt_PredLLPS combines a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) through a fully connected layer, where the CNN utilizes evolutionary information features as input, and BiLSTM utilizes multimodal features as input. If a protein is predicted to be an LLPS protein, it is input into the second task model to predict whether this protein needs to interact with its partners to undergo LLPS. The second task model employs the XGBoost classification algorithm and 37 physicochemical properties following a three-step feature selection. The effectiveness of the model was validated on multiple benchmark datasets, and in silico saturation mutagenesis was used to identify regions that play a key role in phase separation. These findings may assist future research on the LLPS mechanism and the discovery of potential phase separation proteins.
Collapse
Affiliation(s)
- Yetong Zhou
- School of Science, Dalian Maritime University, 1 Linghai Road, Dalian, 116026, China
| | - Shengming Zhou
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, 150040, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, 150040, China
| | - Yue Bi
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victora 3800, Australia
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, 1 Linghai Road, Dalian, 116026, China
| |
Collapse
|
21
|
Liu S, Shi T, Yu J, Li R, Lin H, Deng K. Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review. Int J Mol Sci 2024; 25:9844. [PMID: 39337334 PMCID: PMC11432553 DOI: 10.3390/ijms25189844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure-activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.
Collapse
Affiliation(s)
| | | | | | | | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| |
Collapse
|
22
|
Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GPS. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput Biol Med 2024; 179:108926. [PMID: 39038391 DOI: 10.1016/j.compbiomed.2024.108926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/17/2024] [Accepted: 07/17/2024] [Indexed: 07/24/2024]
Abstract
Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Purva Tijare
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
23
|
Hao Y, Liu X, Fu H, Shao X, Cai W. PGAT-ABPp: harnessing protein language models and graph attention networks for antibacterial peptide identification with remarkable accuracy. Bioinformatics 2024; 40:btae497. [PMID: 39120878 PMCID: PMC11338452 DOI: 10.1093/bioinformatics/btae497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 07/24/2024] [Accepted: 08/08/2024] [Indexed: 08/10/2024] Open
Abstract
MOTIVATION The emergence of drug-resistant pathogens represents a formidable challenge to global health. Using computational methods to identify the antibacterial peptides (ABPs), an alternative antimicrobial agent, has demonstrated advantages in further drug design studies. Most of the current approaches, however, rely on handcrafted features and underutilize structural information, which may affect prediction performance. RESULTS To present an ultra-accurate model for ABP identification, we propose a novel deep learning approach, PGAT-ABPp. PGAT-ABPp leverages structures predicted by AlphaFold2 and a pretrained protein language model, ProtT5-XL-U50 (ProtT5), to construct graphs. Then the graph attention network (GAT) is adopted to learn global discriminative features from the graphs. PGAT-ABPp outperforms the other fourteen state-of-the-art models in terms of accuracy, F1-score and Matthews Correlation Coefficient on the independent test dataset. The results show that ProtT5 has significant advantages in the identification of ABPs and the introduction of spatial information further improves the prediction performance of the model. The interpretability analysis of key residues in known active ABPs further underscores the superiority of PGAT-ABPp. AVAILABILITY AND IMPLEMENTATION The datasets and source codes for the PGAT-ABPp model are available at https://github.com/moonseter/PGAT-ABPp/.
Collapse
Affiliation(s)
- Yuelei Hao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
24
|
Geng G, Wang L, Xu Y, Wang T, Ma W, Duan H, Zhang J, Mao A. MGDDI: A multi-scale graph neural networks for drug-drug interaction prediction. Methods 2024; 228:22-29. [PMID: 38754712 DOI: 10.1016/j.ymeth.2024.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/09/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024] Open
Abstract
Drug-drug interaction (DDI) prediction is crucial for identifying interactions within drug combinations, especially adverse effects due to physicochemical incompatibility. While current methods have made strides in predicting adverse drug interactions, limitations persist. Most methods rely on handcrafted features, restricting their applicability. They predominantly extract information from individual drugs, neglecting the importance of interaction details between drug pairs. To address these issues, we propose MGDDI, a graph neural network-based model for predicting potential adverse drug interactions. Notably, we use a multiscale graph neural network (MGNN) to learn drug molecule representations, addressing substructure size variations and preventing gradient issues. For capturing interaction details between drug pairs, we integrate a substructure interaction learning module based on attention mechanisms. Our experimental results demonstrate MGDDI's superiority in predicting adverse drug interactions, offering a solution to current methodological limitations.
Collapse
Affiliation(s)
- Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, China; Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Tianshuo Wang
- School of Software, Shandong University, Jinan, China
| | - Wei Ma
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China.
| | - Anqiong Mao
- The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Department of Anesthesiology, Luzhou, China.
| |
Collapse
|
25
|
Feng Z, Huang W, Li H, Zhu H, Kang Y, Li Z. DGCPPISP: a PPI site prediction model based on dynamic graph convolutional network and two-stage transfer learning. BMC Bioinformatics 2024; 25:252. [PMID: 39085781 PMCID: PMC11293074 DOI: 10.1186/s12859-024-05864-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/10/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Proteins play a pivotal role in the diverse array of biological processes, making the precise prediction of protein-protein interaction (PPI) sites critical to numerous disciplines including biology, medicine and pharmacy. While deep learning methods have progressively been implemented for the prediction of PPI sites within proteins, the task of enhancing their predictive performance remains an arduous challenge. RESULTS In this paper, we propose a novel PPI site prediction model (DGCPPISP) based on a dynamic graph convolutional neural network and a two-stage transfer learning strategy. Initially, we implement the transfer learning from dual perspectives, namely feature input and model training that serve to supply efficacious prior knowledge for our model. Subsequently, we construct a network designed for the second stage of training, which is built on the foundation of dynamic graph convolution. CONCLUSIONS To evaluate its effectiveness, the performance of the DGCPPISP model is scrutinized using two benchmark datasets. The ensuing results demonstrate that DGCPPISP outshines competing methods in terms of performance. Specifically, DGCPPISP surpasses the second-best method, EGRET, by margins of 5.9%, 10.1%, and 13.3% for F1-measure, AUPRC, and MCC metrics respectively on Dset_186_72_PDB164. Similarly, on Dset_331, it eclipses the performance of the runner-up method, HN-PPISP, by 14.5%, 19.8%, and 29.9% respectively.
Collapse
Affiliation(s)
- Zijian Feng
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Weihong Huang
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Haohao Li
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Hancan Zhu
- School of Mathematics, Physics and Information, Shaoxing University, Shaoxing, 312000, Zhejiang, China
| | - Yanlei Kang
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
| | - Zhong Li
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China.
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China.
| |
Collapse
|
26
|
Ramos-Llorens M, Bello-Madruga R, Valle J, Andreu D, Torrent M. PyAMPA: a high-throughput prediction and optimization tool for antimicrobial peptides. mSystems 2024; 9:e0135823. [PMID: 38934543 PMCID: PMC11264690 DOI: 10.1128/msystems.01358-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 05/27/2024] [Indexed: 06/28/2024] Open
Abstract
The alarming rise of antibiotic-resistant bacterial infections is driving efforts to develop alternatives to conventional antibiotics. In this context, antimicrobial peptides (AMPs) have emerged as promising candidates for their ability to target a broad range of microorganisms. However, the development of AMPs with optimal potency, selectivity, and/or stability profiles remains a challenge. To address it, computational tools for predicting AMP properties and designing novel peptides have gained increasing attention. PyAMPA is a novel platform for AMP discovery. It consists of five modules, namely AMPScreen, AMPValidate, AMPSolve, AMPMutate, and AMPOptimize, that allow high-throughput proteome inspection, candidate screening, and optimization through point-mutation and genetic algorithms. The platform also offers additional tools for predicting and evaluating AMP properties, including antimicrobial and cytotoxic activity, and peptide half-life. By providing innovative and accessible inroads into AMP motifs in proteomes, PyAMPA will enable advances in AMP development and potential translation into clinically useful molecules. PyAMPA is available at: https://github.com/SysBioUAB/PyAMPA. IMPORTANCE This paper introduces PyAMPA, a new bioinformatics platform designed for the discovery and optimization of antimicrobial peptides (AMPs). It addresses the urgent need for new antimicrobials due to the rise of antibiotic-resistant infections. PyAMPA, with its five predictive modules -AMPScreen, AMPValidate, AMPSolve, AMPMutate and AMPOptimize, enables high-throughput screening of proteomes to identify potential AMP motifs and optimize them for clinical use. Its unique approach, combining prediction, design, and optimization tools, makes PyAMPA a robust solution for developing new AMP-based therapies, offering a significant advance in combatting antibiotic resistance.
Collapse
Affiliation(s)
- Marc Ramos-Llorens
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Roberto Bello-Madruga
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Javier Valle
- Barcelona Biomedical Research Park, Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Andreu
- Barcelona Biomedical Research Park, Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Marc Torrent
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
27
|
Zhao F, Qiu J, Xiang D, Jiao P, Cao Y, Xu Q, Qiao D, Xu H, Cao Y. deepAMPNet: a novel antimicrobial peptide predictor employing AlphaFold2 predicted structures and a bi-directional long short-term memory protein language model. PeerJ 2024; 12:e17729. [PMID: 39040937 PMCID: PMC11262304 DOI: 10.7717/peerj.17729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/20/2024] [Indexed: 07/24/2024] Open
Abstract
Background Global public health is seriously threatened by the escalating issue of antimicrobial resistance (AMR). Antimicrobial peptides (AMPs), pivotal components of the innate immune system, have emerged as a potent solution to AMR due to their therapeutic potential. Employing computational methodologies for the prompt recognition of these antimicrobial peptides indeed unlocks fresh perspectives, thereby potentially revolutionizing antimicrobial drug development. Methods In this study, we have developed a model named as deepAMPNet. This model, which leverages graph neural networks, excels at the swift identification of AMPs. It employs structures of antimicrobial peptides predicted by AlphaFold2, encodes residue-level features through a bi-directional long short-term memory (Bi-LSTM) protein language model, and constructs adjacency matrices anchored on amino acids' contact maps. Results In a comparative study with other state-of-the-art AMP predictors on two external independent test datasets, deepAMPNet outperformed in accuracy. Furthermore, in terms of commonly accepted evaluation matrices such as AUC, Mcc, sensitivity, and specificity, deepAMPNet achieved the highest or highly comparable performances against other predictors. Conclusion deepAMPNet interweaves both structural and sequence information of AMPs, stands as a high-performance identification model that propels the evolution and design in antimicrobial peptide pharmaceuticals. The data and code utilized in this study can be accessed at https://github.com/Iseeu233/deepAMPNet.
Collapse
Affiliation(s)
- Fei Zhao
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Junhui Qiu
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Dongyou Xiang
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Pengrui Jiao
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Yu Cao
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Qingrui Xu
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Dairong Qiao
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Hui Xu
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| | - Yi Cao
- Microbiology and Metabolic Engineering Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
28
|
Roque-Borda CA, Primo LMDG, Franzyk H, Hansen PR, Pavan FR. Recent advances in the development of antimicrobial peptides against ESKAPE pathogens. Heliyon 2024; 10:e31958. [PMID: 38868046 PMCID: PMC11167364 DOI: 10.1016/j.heliyon.2024.e31958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/23/2024] [Accepted: 05/24/2024] [Indexed: 06/14/2024] Open
Abstract
Multi-drug resistant ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) are a global health threat. The severity of the problem lies in its impact on mortality, therapeutic limitations, the threat to public health, and the costs associated with managing infections caused by these resistant strains. Effectively addressing this challenge requires innovative approaches to research, the development of new antimicrobials, and more responsible antibiotic use practices globally. Antimicrobial peptides (AMPs) are a part of the innate immune system of all higher organisms. They are short, cationic and amphipathic molecules with broad-spectrum activity. AMPs interact with the negatively charged bacterial membrane. In recent years, AMPs have attracted considerable interest as potential antibiotics. However, AMPs have low bioavailability and short half-lives, which may be circumvented by chemical modification. This review presents recent in vitro and in silico strategies for the modification of AMPs to improve their stability and application in preclinical experiments.
Collapse
Affiliation(s)
- Cesar Augusto Roque-Borda
- São Paulo State University (UNESP), Tuberculosis Research Laboratory, School of Pharmaceutical Sciences, Araraquara, Brazil
- Universidad Católica de Santa María, Vicerrectorado de Investigación, Arequipa, Peru
| | | | - Henrik Franzyk
- University of Copenhagen, Faculty of Health and Medical Sciences, Department of Drug Design and Pharmacology, Denmark
| | - Paul Robert Hansen
- University of Copenhagen, Faculty of Health and Medical Sciences, Department of Drug Design and Pharmacology, Denmark
| | - Fernando Rogério Pavan
- São Paulo State University (UNESP), Tuberculosis Research Laboratory, School of Pharmaceutical Sciences, Araraquara, Brazil
| |
Collapse
|
29
|
Shaon MSH, Karim T, Sultan MF, Ali MM, Ahmed K, Hasan MZ, Moustafa A, Bui FM, Al-Zahrani FA. AMP-RNNpro: a two-stage approach for identification of antimicrobials using probabilistic features. Sci Rep 2024; 14:12892. [PMID: 38839785 PMCID: PMC11153637 DOI: 10.1038/s41598-024-63461-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
Antimicrobials are molecules that prevent the formation of microorganisms such as bacteria, viruses, fungi, and parasites. The necessity to detect antimicrobial peptides (AMPs) using machine learning and deep learning arises from the need for efficiency to accelerate the discovery of AMPs, and contribute to developing effective antimicrobial therapies, especially in the face of increasing antibiotic resistance. This study introduced AMP-RNNpro based on Recurrent Neural Network (RNN), an innovative model for detecting AMPs, which was designed with eight feature encoding methods that are selected according to four criteria: amino acid compositional, grouped amino acid compositional, autocorrelation, and pseudo-amino acid compositional to represent the protein sequences for efficient identification of AMPs. In our framework, two-stage predictions have been conducted. Initially, this study analyzed 33 models on these feature extractions. Then, we selected the best six models from these models using rigorous performance metrics. In the second stage, probabilistic features have been generated from the selected six models in each feature encoding and they are aggregated to be fed into our final meta-model called AMP-RNNpro. This study also introduced 20 features with SHAP, which are crucial in the drug development fields, where we discover AAC, ASDC, and CKSAAGP features are highly impactful for detection and drug discovery. Our proposed framework, AMP-RNNpro excels in the identification of novel Amps with 97.15% accuracy, 96.48% sensitivity, and 97.87% specificity. We built a user-friendly website for demonstrating the accurate prediction of AMPs based on the proposed approach which can be accessed at http://13.126.159.30/ .
Collapse
Affiliation(s)
- Md Shazzad Hossain Shaon
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Tasmin Karim
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Md Fahim Sultan
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Md Mamun Ali
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
- Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - Kawsar Ahmed
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
- Group of Bio-photomatiχ, Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.
| | - Md Zahid Hasan
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Ahmed Moustafa
- Department of Human Anatomy and Physiology, The Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
- School of Psychology, Centre for Data Analytics, Bond University, Gold Coast, QLD, Australia
| | - Francis M Bui
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
| | | |
Collapse
|
30
|
Aslan A, Ari Yuka S. Therapeutic peptides for coronary artery diseases: in silico methods and current perspectives. Amino Acids 2024; 56:37. [PMID: 38822212 PMCID: PMC11143054 DOI: 10.1007/s00726-024-03397-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/06/2024] [Indexed: 06/02/2024]
Abstract
Many drug formulations containing small active molecules are used for the treatment of coronary artery disease, which affects a significant part of the world's population. However, the inadequate profile of these molecules in terms of therapeutic efficacy has led to the therapeutic use of protein and peptide-based biomolecules with superior properties, such as target-specific affinity and low immunogenicity, in critical diseases. Protein‒protein interactions, as a consequence of advances in molecular techniques with strategies involving the combined use of in silico methods, have enabled the design of therapeutic peptides to reach an advanced dimension. In particular, with the advantages provided by protein/peptide structural modeling, molecular docking for the study of their interactions, molecular dynamics simulations for their interactions under physiological conditions and machine learning techniques that can work in combination with all these, significant progress has been made in approaches to developing therapeutic peptides that can modulate the development and progression of coronary artery diseases. In this scope, this review discusses in silico methods for the development of peptide therapeutics for the treatment of coronary artery disease and strategies for identifying the molecular mechanisms that can be modulated by these designs and provides a comprehensive perspective for future studies.
Collapse
Affiliation(s)
- Ayca Aslan
- Department of Bioengineering, Faculty of Chemical and Metallurgical Engineering, Yildiz Technical University, Esenler, Istanbul, Turkey
- Health Biotechnology Joint Research and Application Center of Excellence, Esenler, Istanbul, Turkey
| | - Selcen Ari Yuka
- Department of Bioengineering, Faculty of Chemical and Metallurgical Engineering, Yildiz Technical University, Esenler, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Esenler, Istanbul, Turkey.
| |
Collapse
|
31
|
Sun SL, Zhou BW, Liu SZ, Xiu YH, Bilal A, Long HX. Prediction of miRNAs and diseases association based on sparse autoencoder and MLP. Front Genet 2024; 15:1369811. [PMID: 38873111 PMCID: PMC11169787 DOI: 10.3389/fgene.2024.1369811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 05/07/2024] [Indexed: 06/15/2024] Open
Abstract
Introduction: MicroRNAs (miRNAs) are small and non-coding RNA molecules which have multiple important regulatory roles within cells. With the deepening research on miRNAs, more and more researches show that the abnormal expression of miRNAs is closely related to various diseases. The relationship between miRNAs and diseases is crucial for discovering the pathogenesis of diseases and exploring new treatment methods. Methods: Therefore, we propose a new sparse autoencoder and MLP method (SPALP) to predict the association between miRNAs and diseases. In this study, we adopt advanced deep learning technologies, including sparse autoencoder and multi-layer perceptron (MLP), to improve the accuracy of predicting miRNA-disease associations. Firstly, the SPALP model uses a sparse autoencoder to perform feature learning and extract the initial features of miRNAs and diseases separately, obtaining the latent features of miRNAs and diseases. Then, the latent features combine miRNAs functional similarity data with diseases semantic similarity data to construct comprehensive miRNAs-diseases datasets. Subsequently, the MLP model can predict the unknown association among miRNAs and diseases. Result: To verify the performance of our model, we set up several comparative experiments. The experimental results show that, compared with traditional methods and other deep learning prediction methods, our method has significantly improved the accuracy of predicting miRNAs-disease associations, with 94.61% accuracy and 0.9859 AUC value. Finally, we conducted case study of SPALP model. We predicted the top 30 miRNAs that might be related to Lupus Erythematosus, Ecute Myeloid Leukemia, Cardiovascular, Stroke, Diabetes Mellitus five elderly diseases and validated that 27, 29, 29, 30, and 30 of the top 30 are indeed associated. Discussion: The SPALP approach introduced in this study is adept at forecasting the links between miRNAs and diseases, addressing the complexities of analyzing extensive bioinformatics datasets and enriching the comprehension contribution to disease progression of miRNAs.
Collapse
Affiliation(s)
- Si-Lin Sun
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Bing-Wei Zhou
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Sheng-Zheng Liu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Yu-Han Xiu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Anas Bilal
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| | - Hai-Xia Long
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| |
Collapse
|
32
|
Li Y, Wei X, Yang Q, Xiong A, Li X, Zou Q, Cui F, Zhang Z. msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths. BMC Biol 2024; 22:126. [PMID: 38816885 PMCID: PMC11555825 DOI: 10.1186/s12915-024-01923-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/21/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. RESULTS In this study, a two-stage integrated predictor called "msBERT-Promoter" is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. CONCLUSIONS msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.
Collapse
Affiliation(s)
- Yazi Li
- School of Mathematics and Statistics, Hainan University, Haikou, 570228, China
| | - Xiaoman Wei
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Qinglin Yang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - An Xiong
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| |
Collapse
|
33
|
Cheng N, Wang L, Liu Y, Song B, Ding C. HANSynergy: Heterogeneous Graph Attention Network for Drug Synergy Prediction. J Chem Inf Model 2024; 64:4334-4347. [PMID: 38709204 PMCID: PMC11135324 DOI: 10.1021/acs.jcim.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
Drug synergy therapy is a promising strategy for cancer treatment. However, the extensive variety of available drugs and the time-intensive process of determining effective drug combinations through clinical trials pose significant challenges. It requires a reliable method for the rapid and precise selection of drug synergies. In response, various computational strategies have been developed for predicting drug synergies, yet the exploitation of heterogeneous biological network features remains underexplored. In this study, we construct a heterogeneous graph that encompasses diverse biological entities and interactions, utilizing rich data sets from sources, such as DrugCombDB, PubChem, UniProt, and cancer cell line encyclopedia (CCLE). We initialize node feature representations and introduce a novel virtual node to enhance drug representation. Our proposed method, the heterogeneous graph attention network for drug-drug synergy prediction (HANSynergy), has been experimentally validated to demonstrate that the heterogeneous graph attention network can extract key node features, efficiently harness the diversity of information, and further enhance network functionality through the incorporation of a multihead attention mechanism. In the comparative experiment, the highest accuracy (Acc) and area under the curve (AUC) are 0.877 and 0.947, respectively, in DrugCombDB_early data set, demonstrating the superiority of HANSynergy over the competing methods. Moreover, protein-protein interactions are important in understanding the mechanism of action of drugs. The heterogeneous attention mechanism facilitates protein-protein interaction analysis. By analyzing the changes of attention weight before and after heterogeneous network training, we investigated proteins that may be associated with drug combinations. Additionally, case studies align our findings with existing research, underscoring the potential of HANSynergy in drug synergy prediction. This advancement not only contributes to the burgeoning field of drug synergy prediction but also holds the potential to provide valuable insights and uncover new drug synergies for combating cancer.
Collapse
Affiliation(s)
- Ning Cheng
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
| | - Li Wang
- Degree
Programs in Systems and information Engineering, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Yiping Liu
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bosheng Song
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Changsong Ding
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
- Big
Data Analysis Laboratory of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
| |
Collapse
|
34
|
Cordoves-Delgado G, García-Jacas CR. Predicting Antimicrobial Peptides Using ESMFold-Predicted Structures and ESM-2-Based Amino Acid Features with Graph Deep Learning. J Chem Inf Model 2024; 64:4310-4321. [PMID: 38739853 DOI: 10.1021/acs.jcim.3c02061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Currently, antimicrobial resistance constitutes a serious threat to human health. Drugs based on antimicrobial peptides (AMPs) constitute one of the alternatives to address it. Shallow and deep learning (DL)-based models have mainly been built from amino acid sequences to predict AMPs. Recent advances in tertiary (3D) structure prediction have opened new opportunities in this field. In this sense, models based on graphs derived from predicted peptide structures have recently been proposed. However, these models are not in correspondence with state-of-the-art approaches to codify evolutionary information, and, in addition, they are memory- and time-consuming because depend on multiple sequence alignment. Herein, we presented a framework to create alignment-free models based on graph representations generated from ESMFold-predicted peptide structures, whose nodes are characterized with amino acid-level evolutionary information derived from the Evolutionary Scale Modeling (ESM-2) models. A graph attention network (GAT) was implemented to assess the usefulness of the framework in the AMP classification. To this end, a set comprised of 67,058 peptides was used. It was demonstrated that the proposed methodology allowed to build GAT models with generalization abilities consistently better than 20 state-of-the-art non-DL-based and DL-based models. The best GAT models were developed using evolutionary information derived from the 36- and 33-layer ESM-2 models. Similarity studies showed that the best-built GAT models codified different chemical spaces, and thus they were fused to significantly improve the classification. In general, the results suggest that esm-AxP-GDL is a promissory tool to develop good, structure-dependent, and alignment-free models that can be successfully applied in the screening of large data sets. This framework should not only be useful to classify AMPs but also for modeling other peptide and protein activities.
Collapse
Affiliation(s)
- Greneter Cordoves-Delgado
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| | - César R García-Jacas
- Cátedras CONAHCYT - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| |
Collapse
|
35
|
Chen N, Yu J, Zhe L, Wang F, Li X, Wong KC. TP-LMMSG: a peptide prediction graph neural network incorporating flexible amino acid property representation. Brief Bioinform 2024; 25:bbae308. [PMID: 38920345 PMCID: PMC11200197 DOI: 10.1093/bib/bbae308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/28/2024] [Accepted: 06/10/2024] [Indexed: 06/27/2024] Open
Abstract
Bioactive peptide therapeutics has been a long-standing research topic. Notably, the antimicrobial peptides (AMPs) have been extensively studied for its therapeutic potential. Meanwhile, the demand for annotating other therapeutic peptides, such as antiviral peptides (AVPs) and anticancer peptides (ACPs), also witnessed an increase in recent years. However, we conceive that the structure of peptide chains and the intrinsic information between the amino acids is not fully investigated among the existing protocols. Therefore, we develop a new graph deep learning model, namely TP-LMMSG, which offers lightweight and easy-to-deploy advantages while improving the annotation performance in a generalizable manner. The results indicate that our model can accurately predict the properties of different peptides. The model surpasses the other state-of-the-art models on AMP, AVP and ACP prediction across multiple experimental validated datasets. Moreover, TP-LMMSG also addresses the challenges of time-consuming pre-processing in graph neural network frameworks. With its flexibility in integrating heterogeneous peptide features, our model can provide substantial impacts on the screening and discovery of therapeutic peptides. The source code is available at https://github.com/NanjunChen37/TP_LMMSG.
Collapse
Affiliation(s)
- Nanjun Chen
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Liu Zhe
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Chang Chun, Ji Lin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Kowloon, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guang Dong, China
| |
Collapse
|
36
|
Zhu C, Zhang C, Shang T, Zhang C, Zhai S, Cao L, Xu Z, Su Z, Song Y, Su A, Li C, Duan H. GAPS: a geometric attention-based network for peptide binding site identification by the transfer learning approach. Brief Bioinform 2024; 25:bbae297. [PMID: 38990514 PMCID: PMC11238429 DOI: 10.1093/bib/bbae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/28/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024] Open
Abstract
Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
Collapse
Affiliation(s)
- Cheng Zhu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengyun Zhang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Tianfeng Shang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Silong Zhai
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Zhenyu Xu
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - An Su
- College of Chemical Engineering, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengxi Li
- College of Chemical and Biological Engineering, Zhejiang University, Yuhangtang Road, Xihu District, Hangzhou 310027, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| |
Collapse
|
37
|
Zervou MA, Doutsi E, Pantazis Y, Tsakalides P. De Novo Antimicrobial Peptide Design with Feedback Generative Adversarial Networks. Int J Mol Sci 2024; 25:5506. [PMID: 38791544 PMCID: PMC11122239 DOI: 10.3390/ijms25105506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/10/2024] [Accepted: 05/15/2024] [Indexed: 05/26/2024] Open
Abstract
Antimicrobial peptides (AMPs) are promising candidates for new antibiotics due to their broad-spectrum activity against pathogens and reduced susceptibility to resistance development. Deep-learning techniques, such as deep generative models, offer a promising avenue to expedite the discovery and optimization of AMPs. A remarkable example is the Feedback Generative Adversarial Network (FBGAN), a deep generative model that incorporates a classifier during its training phase. Our study aims to explore the impact of enhanced classifiers on the generative capabilities of FBGAN. To this end, we introduce two alternative classifiers for the FBGAN framework, both surpassing the accuracy of the original classifier. The first classifier utilizes the k-mers technique, while the second applies transfer learning from the large protein language model Evolutionary Scale Modeling 2 (ESM2). Integrating these classifiers into FBGAN not only yields notable performance enhancements compared to the original FBGAN but also enables the proposed generative models to achieve comparable or even superior performance to established methods such as AMPGAN and HydrAMP. This achievement underscores the effectiveness of leveraging advanced classifiers within the FBGAN framework, enhancing its computational robustness for AMP de novo design and making it comparable to existing literature.
Collapse
Affiliation(s)
- Michaela Areti Zervou
- Department of Computer Science, University of Crete, 700 13 Heraklion, Greece
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 700 13 Heraklion, Greece;
| | - Effrosyni Doutsi
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 700 13 Heraklion, Greece;
| | - Yannis Pantazis
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology-Hellas, 700 13 Heraklion, Greece;
| | - Panagiotis Tsakalides
- Department of Computer Science, University of Crete, 700 13 Heraklion, Greece
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 700 13 Heraklion, Greece;
| |
Collapse
|
38
|
Cui Y, Liu H, Ming Y, Zhang Z, Liu L, Liu R. Prediction of strand-specific and cell-type-specific G-quadruplexes based on high-resolution CUT&Tag data. Brief Funct Genomics 2024; 23:265-275. [PMID: 37357985 DOI: 10.1093/bfgp/elad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/20/2023] [Accepted: 06/01/2023] [Indexed: 06/27/2023] Open
Abstract
G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.
Collapse
Affiliation(s)
- Yizhi Cui
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Hongzhi Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Yutong Ming
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, 36830, Alabama, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Ruijun Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| |
Collapse
|
39
|
Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics 2024; 40:btae297. [PMID: 38696758 PMCID: PMC11654579 DOI: 10.1093/bioinformatics/btae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/02/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.
Collapse
Affiliation(s)
- Shihu Jiao
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic
Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science
and Technology of China, Quzhou 324000, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191,
China
| |
Collapse
|
40
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. Bioinformatics 2024; 40:btae306. [PMID: 38715444 PMCID: PMC11256965 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
41
|
Ma X, Li Z, Du Z, Xu Y, Chen Y, Zhuo L, Fu X, Liu R. Advancing cancer driver gene detection via Schur complement graph augmentation and independent subspace feature extraction. Comput Biol Med 2024; 174:108484. [PMID: 38643595 DOI: 10.1016/j.compbiomed.2024.108484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/18/2024] [Accepted: 04/15/2024] [Indexed: 04/23/2024]
Abstract
Accurately identifying cancer driver genes (CDGs) is crucial for guiding cancer treatment and has recently received great attention from researchers. However, the high complexity and heterogeneity of cancer gene regulatory networks limit the precition accuracy of existing deep learning models. To address this, we introduce a model called SCIS-CDG that utilizes Schur complement graph augmentation and independent subspace feature extraction techniques to effectively predict potential CDGs. Firstly, a random Schur complement strategy is adopted to generate two augmented views of gene network within a graph contrastive learning framework. Rapid randomization of the random Schur complement strategy enhances the model's generalization and its ability to handle complex networks effectively. Upholding the Schur complement principle in expectations promotes the preservation of the original gene network's vital structure in the augmented views. Subsequently, we employ feature extraction technology using multiple independent subspaces, each trained with independent weights to reduce inter-subspace dependence and improve the model's expressiveness. Concurrently, we introduced a feature expansion component based on the structure of the gene network to address issues arising from the limited dimensionality of node features. Moreover, it can alleviate the challenges posed by the heterogeneity of cancer gene networks to some extent. Finally, we integrate a learnable attention weight mechanism into the graph neural network (GNN) encoder, utilizing feature expansion technology to optimize the significance of various feature levels in the prediction task. Following extensive experimental validation, the SCIS-CDG model has exhibited high efficiency in identifying known CDGs and uncovering potential unknown CDGs in external datasets. Particularly when compared to previous conventional GNN models, its performance has seen significant improved. The code and data are publicly available at: https://github.com/mxqmxqmxq/SCIS-CDG.
Collapse
Affiliation(s)
- Xinqian Ma
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Zhen Li
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou 558000, China; Institute of Computational Science and Technology, Guangzhou University, 510000, Guangzhou, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520, Guangzhou, China
| | - Yan Xu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Yifan Chen
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410012, Changsha, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, China.
| |
Collapse
|
42
|
Wan F, Wong F, Collins JJ, de la Fuente-Nunez C. Machine learning for antimicrobial peptide identification and design. NATURE REVIEWS BIOENGINEERING 2024; 2:392-407. [PMID: 39850516 PMCID: PMC11756916 DOI: 10.1038/s44222-024-00152-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - James J. Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| |
Collapse
|
43
|
Li C, Zou Q, Jia C, Zheng J. AMPpred-MFA: An Interpretable Antimicrobial Peptide Predictor with a Stacking Architecture, Multiple Features, and Multihead Attention. J Chem Inf Model 2024; 64:2393-2404. [PMID: 37799091 DOI: 10.1021/acs.jcim.3c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
Antimicrobial peptides (AMPs) are small molecular polypeptides that can be widely used in the prevention and treatment of microbial infections. Although many computational models have been proposed to help identify AMPs, a high-performance and interpretable model is still lacking. In this study, new benchmark data sets are collected and processed, and a stacking deep architecture named AMPpred-MFA is carefully designed to discover and identify AMPs. Multiple features and a multihead attention mechanism are utilized on the basis of a bidirectional long short-term memory (LSTM) network and a convolutional neural network (CNN). The effectiveness of AMPpred-MFA is verified through five independent tests conducted in batches. Experimental results show that AMPpred-MFA achieves a state-of-the-art performance. The visualization interpretability analyses and ablation experiments offer a further understanding of the model behavior and performance, validating the importance of our feature representation and stacking architecture, especially the multihead attention mechanism. Therefore, AMPpred-MFA can be considered a reliable and efficient approach to understanding and predicting AMPs.
Collapse
Affiliation(s)
- Changjiang Li
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
44
|
Zhang ZY, Zhang Z, Ye X, Sakurai T, Lin H. A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens. Int J Biol Macromol 2024; 265:130659. [PMID: 38462114 DOI: 10.1016/j.ijbiomac.2024.130659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 03/12/2024]
Abstract
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
45
|
Chen M, Sun M, Su X, Tiwari P, Ding Y. Fuzzy kernel evidence Random Forest for identifying pseudouridine sites. Brief Bioinform 2024; 25:bbae169. [PMID: 38622357 PMCID: PMC11018548 DOI: 10.1093/bib/bbae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/27/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| | - Mingai Sun
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan 528000, China
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
46
|
Wu X, Lin H, Bai R, Duan H. Deep learning for advancing peptide drug development: Tools and methods in structure prediction and design. Eur J Med Chem 2024; 268:116262. [PMID: 38387334 DOI: 10.1016/j.ejmech.2024.116262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/06/2024] [Accepted: 02/17/2024] [Indexed: 02/24/2024]
Abstract
Peptides can bind challenging disease targets with high affinity and specificity, offering enormous opportunities for addressing unmet medical needs. However, peptides' unique features, including smaller size, increased structural flexibility, and limited data availability, pose additional challenges to the design process compared to proteins. This review explores the dynamic field of peptide therapeutics, leveraging deep learning to enhance structure prediction and design. Our exploration encompasses various facets of peptide research, ranging from dataset curation handling to model development. As deep learning technologies become more refined, we channel our efforts into peptide structure prediction and design, aligning with the fundamental principles of structure-activity relationships in drug development. To guide researchers in harnessing the potential of deep learning to advance peptide drug development, our insights comprehensively explore current challenges and future directions of peptide therapeutics.
Collapse
Affiliation(s)
- Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, PR China
| | - Huitian Lin
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, PR China
| | - Renren Bai
- School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, PR China.
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, PR China.
| |
Collapse
|
47
|
Ju H, Cui Y, Su Q, Juan L, Manavalan B. CODENET: A deep learning model for COVID-19 detection. Comput Biol Med 2024; 171:108229. [PMID: 38447500 DOI: 10.1016/j.compbiomed.2024.108229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/20/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024]
Abstract
Conventional COVID-19 testing methods have some flaws: they are expensive and time-consuming. Chest X-ray (CXR) diagnostic approaches can alleviate these flaws to some extent. However, there is no accurate and practical automatic diagnostic framework with good interpretability. The application of artificial intelligence (AI) technology to medical radiography can help to accurately detect the disease, reduce the burden on healthcare organizations, and provide good interpretability. Therefore, this study proposes a new deep neural network (CNN) based on CXR for COVID-19 diagnosis - CodeNet. This method uses contrastive learning to make full use of latent image data to enhance the model's ability to extract features and generalize across different data domains. On the evaluation dataset, the proposed method achieves an accuracy as high as 94.20%, outperforming several other existing methods used for comparison. Ablation studies validate the efficacy of the proposed method, while interpretability analysis shows that the method can effectively guide clinical professionals. This work demonstrates the superior detection performance of a CNN using contrastive learning techniques on CXR images, paving the way for computer vision and artificial intelligence technologies to leverage massive medical data for disease diagnosis.
Collapse
Affiliation(s)
- Hong Ju
- Heilongjiang Agricultural Engineering Vocational College, China
| | - Yanyan Cui
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Qiaosen Su
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
48
|
Zhang J, Chen Q, Liu B. iNucRes-ASSH: Identifying nucleic acid-binding residues in proteins by using self-attention-based structure-sequence hybrid neural network. Proteins 2024; 92:395-410. [PMID: 37915276 DOI: 10.1002/prot.26626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 09/27/2023] [Accepted: 10/17/2023] [Indexed: 11/03/2023]
Abstract
Interaction between proteins and nucleic acids is crucial to many cellular activities. Accurately detecting nucleic acid-binding residues (NABRs) in proteins can help researchers better understand the interaction mechanism between proteins and nucleic acids. Structure-based methods can generally make more accurate predictions than sequence-based methods. However, the existing structure-based methods are sensitive to protein conformational changes, causing limited generalizability. More effective and robust approaches should be further explored. In this study, we propose iNucRes-ASSH to identify nucleic acid-binding residues with a self-attention-based structure-sequence hybrid neural network. It improves the generalizability and robustness of NABR prediction from two levels: residue representation and prediction model. Experimental results show that iNucRes-ASSH can predict the nucleic acid-binding residues even when the experimentally validated structures are unavailable and outperforms five competing methods on a recent benchmark dataset and a widely used test dataset.
Collapse
Affiliation(s)
- Jun Zhang
- National Engineering Laboratory for Big Data System Computing Technology, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
49
|
Zhang HQ, Liu SH, Li R, Yu JW, Ye DX, Yuan SS, Lin H, Huang CB, Tang H. MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier. ACS OMEGA 2024; 9:8439-8447. [PMID: 38405489 PMCID: PMC10882704 DOI: 10.1021/acsomega.3c09587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for the metal ion-binding protein classification task. In the model, we independently trained XGBoost, LightGBM, and CatBoost algorithms and integrated the output results through an SVM voting mechanism. This innovative combination has led to a significant breakthrough in the predictive performance of our model. As a result, we achieved accuracies of 95.13% and 85.19%, respectively, in predicting metal ion-binding proteins and their types. Our research not only confirms the effectiveness of Word2Vec technology in extracting semantic information from protein sequences but also highlights the outstanding performance of the MIBPred classifier in the problem of metal ion-binding protein types. This study provides a reliable tool and method for the in-depth exploration of the structure and function of metal ion-binding proteins.
Collapse
Affiliation(s)
- Hong-Qi Zhang
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shang-Hua Liu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Rui Li
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Jun-Wen Yu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Dong-Xin Ye
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Hao Lin
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School
of Computer Science and Technology, Aba Teachers University, Aba 623002, China
| | - Hua Tang
- School
of Basic Medical Sciences, Southwest Medical
University, Luzhou 646000, China
- Central
Nervous System Drug Key Laboratory of Sichuan Province, Luzhou 646000, China
| |
Collapse
|
50
|
Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol 2024; 22:24. [PMID: 38281919 PMCID: PMC10823650 DOI: 10.1186/s12915-024-01826-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China
| | - Zhanguo Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Avenue, Wuhan, 430030, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4 Block 2 North Jianshe Road, Chengdu, 610054, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|