1
|
Brizuela CA, Liu G, Stokes JM, de la Fuente-Nunez C. AI Methods for Antimicrobial Peptides: Progress and Challenges. Microb Biotechnol 2025; 18:e70072. [PMID: 39754551 DOI: 10.1111/1751-7915.70072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/06/2025] Open
Abstract
Antimicrobial peptides (AMPs) are promising candidates to combat multidrug-resistant pathogens. However, the high cost of extensive wet-lab screening has made AI methods for identifying and designing AMPs increasingly important, with machine learning (ML) techniques playing a crucial role. AI approaches have recently revolutionised this field by accelerating the discovery of new peptides with anti-infective activity, particularly in preclinical mouse models. Initially, classical ML approaches dominated the field, but recently there has been a shift towards deep learning (DL) models. Despite significant contributions, existing reviews have not thoroughly explored the potential of large language models (LLMs), graph neural networks (GNNs) and structure-guided AMP discovery and design. This review aims to fill that gap by providing a comprehensive overview of the latest advancements, challenges and opportunities in using AI methods, with a particular emphasis on LLMs, GNNs and structure-guided design. We discuss the limitations of current approaches and highlight the most relevant topics to address in the coming years for AMP discovery and design.
Collapse
Affiliation(s)
- Carlos A Brizuela
- Department of Computer Science, CICESE Research Center, Ensenada, Mexico
| | - Gary Liu
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Department of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
2
|
Badrinarayanan S, Guntuboina C, Mollaei P, Barati Farimani A. Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties. J Chem Inf Model 2024. [PMID: 39700492 DOI: 10.1021/acs.jcim.4c01443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Peptides are crucial in biological processes and therapeutic applications. Given their importance, advancing our ability to predict peptide properties is essential. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with graph neural networks (GNNs) to predict peptide properties. We integrate PeptideBERT, a transformer model specifically designed for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing a contrastive loss framework, Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the transformer model's predictive accuracy. Evaluations on hemolysis and nonfouling data sets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 88.057% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
Collapse
Affiliation(s)
- Srivathsan Badrinarayanan
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Amir Barati Farimani
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| |
Collapse
|
3
|
Mollaei P, Sadasivam D, Guntuboina C, Barati Farimani A. IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models. J Phys Chem B 2024; 128:12030-12037. [PMID: 39586094 DOI: 10.1021/acs.jpcb.4c02507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Danush Sadasivam
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
4
|
Feller AL, Wilke CO. Peptide-aware chemical language model successfully predicts membrane diffusion of cyclic peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607221. [PMID: 39149303 PMCID: PMC11326283 DOI: 10.1101/2024.08.09.607221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Language modeling applied to biological data has significantly advanced the prediction of membrane penetration for small molecule drugs and natural peptides. However, accurately predicting membrane diffusion for peptides with pharmacologically relevant modifications remains a substantial challenge. Here, we introduce PeptideCLM, a peptide-focused chemical language model capable of encoding peptides with chemical modifications, unnatural or non-canonical amino acids, and cyclizations. We assess this model by predicting membrane diffusion of cyclic peptides, demonstrating greater predictive power than existing chemical language models. Our model is versatile and can be extended beyond membrane diffusion predictions to other target values. Its advantages include the ability to model macromolecules using chemical string notation, a largely unexplored domain, and a simple, flexible architecture that allows for adaptation to any peptide or other macromolecule dataset.
Collapse
|
5
|
Tripathy A, Patne AY, Mohapatra S, Mohapatra SS. Convergence of Nanotechnology and Machine Learning: The State of the Art, Challenges, and Perspectives. Int J Mol Sci 2024; 25:12368. [PMID: 39596433 PMCID: PMC11594285 DOI: 10.3390/ijms252212368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 11/10/2024] [Accepted: 11/13/2024] [Indexed: 11/28/2024] Open
Abstract
Nanotechnology and machine learning (ML) are rapidly emerging fields with numerous real-world applications in medicine, materials science, computer engineering, and data processing. ML enhances nanotechnology by facilitating the processing of dataset in nanomaterial synthesis, characterization, and optimization of nanoscale properties. Conversely, nanotechnology improves the speed and efficiency of computing power, which is crucial for ML algorithms. Although the capabilities of nanotechnology and ML are still in their infancy, a review of the research literature provides insights into the exciting frontiers of these fields and suggests that their integration can be transformative. Future research directions include developing tools for manipulating nanomaterials and ensuring ethical and unbiased data collection for ML models. This review emphasizes the importance of the coevolution of these technologies and their mutual reinforcement to advance scientific and societal goals.
Collapse
Affiliation(s)
- Arnav Tripathy
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
| | - Akshata Y. Patne
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Graduate Programs, Taneja College of Pharmacy, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
| | - Subhra Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Shyam S. Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Graduate Programs, Taneja College of Pharmacy, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| |
Collapse
|
6
|
Yu Q, Zhang Z, Liu G, Li W, Tang Y. ToxGIN: an In silico prediction model for peptide toxicity via graph isomorphism networks integrating peptide sequence and structure information. Brief Bioinform 2024; 25:bbae583. [PMID: 39530430 PMCID: PMC11555482 DOI: 10.1093/bib/bbae583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/22/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Peptide drugs have demonstrated enormous potential in treating a variety of diseases, yet toxicity prediction remains a significant challenge in drug development. Existing models for prediction of peptide toxicity largely rely on sequence information and often neglect the three-dimensional (3D) structures of peptides. This study introduced a novel model for short peptide toxicity prediction, named ToxGIN. The model utilizes Graph Isomorphism Network (GIN), integrating the underlying amino acid sequence composition and the 3D structures of peptides. ToxGIN comprises three primary modules: (i) Sequence processing module, converting peptide 3D structures and sequences into information of nodes and edges; (ii) Feature extraction module, utilizing GIN to learn discriminative features from nodes and edges; (iii) Classification module, employing a fully connected classifier for toxicity prediction. ToxGIN performed well on the independent test set with F1 score = 0.83, AUROC = 0.91, and Matthews correlation coefficient = 0.68, better than existing models for prediction of peptide toxicity. These results validated the effectiveness of integrating 3D structural information with sequence data using GIN for peptide toxicity prediction. The proposed ToxGIN and data can be freely accessible at https://github.com/cihebiyql/ToxGIN.
Collapse
Affiliation(s)
- Qiule Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhixing Zhang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
7
|
Ock J, Mollaei P, Barati Farimani A. GradNav: Accelerated Exploration of Potential Energy Surfaces with Gradient-Based Navigation. J Chem Theory Comput 2024; 20:4088-4098. [PMID: 38728667 PMCID: PMC11137815 DOI: 10.1021/acs.jctc.4c00316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/23/2024] [Accepted: 04/25/2024] [Indexed: 05/12/2024]
Abstract
Exploring the potential energy surface (PES) of molecular systems is important for comprehending their complex behaviors, particularly through the identification of various metastable states. However, the transition between these states is often hindered by substantial energy barriers, demanding prolonged molecular simulations that consume considerable computational resources. Our study introduces the gradient-based navigation (GradNav) algorithm, which accelerates the exploration of the energy surface and enables proper reconstruction of the PES. This algorithm employs a strategy of initiating short simulation runs from updated starting points derived from prior observations to effectively navigate across potential barriers and explore new regions. To evaluate GradNav's performance, we introduce two metrics: the deepest well escape frame (DWEF) and the search success initialization ratio (SSIR). Through applications on Langevin dynamics within Müller-type PESs and molecular dynamics simulations of the Fs-peptide protein, these metrics demonstrate GradNav's enhanced ability to escape deep energy wells and its reduced reliance on initial conditions, as denoted by the reduced DWEF values and increased SSIR values, respectively. Consequently, this improved exploration capability enables more precise energy estimations from simulation trajectories.
Collapse
Affiliation(s)
- Janghoon Ock
- Department
of Chemical Engineering, Carnegie Mellon
University, 5000 Forbes Street, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes
Street, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes
Street, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
8
|
Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics 2024; 40:btae297. [PMID: 38696758 PMCID: PMC11654579 DOI: 10.1093/bioinformatics/btae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/02/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.
Collapse
Affiliation(s)
- Shihu Jiao
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic
Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science
and Technology of China, Quzhou 324000, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191,
China
| |
Collapse
|
9
|
Kim S, Mollaei P, Antony A, Magar R, Barati Farimani A. GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models. J Chem Inf Model 2024; 64:1134-1144. [PMID: 38340054 PMCID: PMC10900288 DOI: 10.1021/acs.jcim.3c01706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 01/29/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024]
Abstract
With the rise of transformers and large language models (LLMs) in chemistry and biology, new avenues for the design and understanding of therapeutics have been opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence data sets. In this letter, we developed the GPCR-BERT model for understanding the sequential design of G protein-coupled receptors (GPCRs). GPCRs are the target of over one-third of Food and Drug Administration-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship among amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, and E/DRY). By utilizing the pretrained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
Collapse
Affiliation(s)
- Seongwon Kim
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Akshay Antony
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Rishikesh Magar
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|