1
|
Rathore AS, Kumar N, Choudhury S, Mehta NK, Raghava GPS. Prediction of hemolytic peptides and their hemolytic concentration. Commun Biol 2025; 8:176. [PMID: 39905233 PMCID: PMC11794569 DOI: 10.1038/s42003-025-07615-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 01/28/2025] [Indexed: 02/06/2025] Open
Abstract
Peptide-based drugs often fail in clinical trials due to their toxicity or hemolytic activity against red blood cells (RBCs). Existing methods predict hemolytic peptides but not the concentration (HC50) required to lyse 50% of RBCs. This study develops classification and regression models to identify and quantify hemolytic activity. These models train on 1926 peptides with experimentally determined HC50 against mammalian RBCs. Analysis indicates that hydrophobic and positively charged residues were associated with higher hemolytic activity. Among classification models, including machine learning (ML), quantum ML, and protein language models, a hybrid model combining random forest (RF) and a motif-based approach achieves the highest area under the receiver operating characteristic curve (AUROC) of 0.921. Regression models achieve a Pearson correlation coefficient (R) of 0.739 and a coefficient of determination (R²) of 0.543. These models outperform existing methods and are implemented in HemoPI2, a web-based platform and standalone software for designing peptides with desired HC50 values ( http://webs.iiitd.edu.in/raghava/hemopi2/ ).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Naman Kumar Mehta
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
2
|
Feller AL, Wilke CO. Peptide-Aware Chemical Language Model Successfully Predicts Membrane Diffusion of Cyclic Peptides. J Chem Inf Model 2025; 65:571-579. [PMID: 39772542 DOI: 10.1021/acs.jcim.4c01441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Language modeling applied to biological data has significantly advanced the prediction of membrane penetration for small-molecule drugs and natural peptides. However, accurately predicting membrane diffusion for peptides with pharmacologically relevant modifications remains a substantial challenge. Here, we introduce PeptideCLM, a peptide-focused chemical language model capable of encoding peptides with chemical modifications, unnatural or noncanonical amino acids, and cyclizations. We assess this model by predicting membrane diffusion of cyclic peptides, demonstrating greater predictive power than existing chemical language models. Our model is versatile and can be extended beyond membrane diffusion predictions to other target values. Its advantages include the ability to model macromolecules using chemical string notation, a largely unexplored domain, and a simple, flexible architecture that allows for adaptation to any peptide or other macromolecule data set.
Collapse
Affiliation(s)
- Aaron L Feller
- Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Claus O Wilke
- Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
3
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Imam IA, Bailey S, Wang D, Zeng S, Xu D, Shao Q. Integrating Protein Language Model and Molecular Dynamics Simulations to Discover Antibiofouling Peptides. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2025; 41:811-821. [PMID: 39810350 DOI: 10.1021/acs.langmuir.4c04140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Antibiofouling peptide materials prevent the nonspecific adsorption of proteins on devices, enabling them to perform their designed functions as desired in complex biological environments. Due to their importance, research on antibiofouling peptide materials has been one of the central subjects of interfacial engineering. However, only a few antibiofouling peptide sequences have been developed. This narrow scope of antibiofouling peptide materials limits their capacity to adapt to the broad spectrum of application scenarios. To address this issue, we searched for antibiofouling peptides in the vast sequence pool of the microbiome library using a combination of deep learning-based high-throughput search and molecular dynamics (MD) simulations. A random forest-based model with an ensemble of ten independent classifiers was developed. Each classifier was trained by prompt-tuning the foundational protein language model Evolution Scaling Modeling version 2 (ESM2) on a distinct training data set. We constructed the databases containing the same amount of antibiofouling and biofouling peptide sequences to attenuate the bias of the existing databases. MD simulations were conducted to investigate the interfacial properties of six selected peptide candidates and their interactions with a lysozyme protein. Two known antibiofouling peptides, (glutamic acid (E)-lysine (K))15 and (EK-proline (P))10, and one known fouling peptide, (glycine)30, were used as the reference. The MD simulation results indicate that five of the six peptides present the potential to resist biofouling. Our research implies that deep learning and molecular simulations can be integrated to discover functional peptide materials for interfacial applications.
Collapse
Affiliation(s)
- Ibrahim A Imam
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Shea Bailey
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
- Department of Chemistry and Biochemistry, Butler University, Indianapolis, Indiana 46208, United States
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Qing Shao
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
5
|
Badrinarayanan S, Guntuboina C, Mollaei P, Barati Farimani A. Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties. J Chem Inf Model 2025; 65:83-91. [PMID: 39700492 PMCID: PMC11733943 DOI: 10.1021/acs.jcim.4c01443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/21/2024]
Abstract
Peptides are crucial in biological processes and therapeutic applications. Given their importance, advancing our ability to predict peptide properties is essential. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with graph neural networks (GNNs) to predict peptide properties. We integrate PeptideBERT, a transformer model specifically designed for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing a contrastive loss framework, Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the transformer model's predictive accuracy. Evaluations on hemolysis and nonfouling data sets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 88.057% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
Collapse
Affiliation(s)
- Srivathsan Badrinarayanan
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
| | - Chakradhar Guntuboina
- Department
of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
| | - Amir Barati Farimani
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United
States
| |
Collapse
|
6
|
Tang S, Zhang Y, Chatterjee P. PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion. ARXIV 2025:arXiv:2412.17780v3. [PMID: 39764410 PMCID: PMC11703324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2025]
Abstract
Peptide therapeutics, a major class of medicines, have achieved remarkable success across diseases such as diabetes and cancer, with landmark examples such as GLP-1 receptor agonists revolutionizing the treatment of type-2 diabetes and obesity. Despite their success, designing peptides that satisfy multiple conflicting objectives, such as target binding affinity, solubility, and membrane permeability, remains a major challenge. Classical drug development and target structure-based design are ineffective for such tasks, as they fail to optimize global functional properties critical for therapeutic efficacy. Existing generative frameworks are largely limited to continuous spaces, unconditioned outputs, or single-objective guidance, making them unsuitable for discrete sequence optimization across multiple properties. To address this, we present PepTune, a multi-objective discrete diffusion model for the simultaneous generation and optimization of therapeutic peptide SMILES. Built on the Masked Discrete Language Model (MDLM) framework, PepTune ensures valid peptide structures with bond-dependent masking schedules and penalty-based objectives. To guide the diffusion process, we propose a Monte Carlo Tree Search (MCTS)-based strategy that balances exploration and exploitation to iteratively refine Pareto-optimal sequences. MCTS integrates classifier-based rewards with search-tree expansion, overcoming gradient estimation challenges and data sparsity. Using PepTune, we generate diverse, chemically modified peptides optimized for multiple therapeutic properties, including target binding affinity, membrane permeability, solubility, hemolysis, and non-fouling for various disease-relevant targets. In total, our results demonstrate that MCTS-guided masked discrete diffusion is a powerful and modular approach for multi-objective sequence design in discrete state spaces.
Collapse
Affiliation(s)
- Sophia Tang
- Department of Biomedical Engineering, Duke University
- Management and Technology Program, University of Pennsylvania
| | - Yinuo Zhang
- Department of Biomedical Engineering, Duke University
- Center of Computational Biology, Duke-NUS Medical School
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
7
|
Brizuela CA, Liu G, Stokes JM, de la Fuente‐Nunez C. AI Methods for Antimicrobial Peptides: Progress and Challenges. Microb Biotechnol 2025; 18:e70072. [PMID: 39754551 PMCID: PMC11702388 DOI: 10.1111/1751-7915.70072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/06/2025] Open
Abstract
Antimicrobial peptides (AMPs) are promising candidates to combat multidrug-resistant pathogens. However, the high cost of extensive wet-lab screening has made AI methods for identifying and designing AMPs increasingly important, with machine learning (ML) techniques playing a crucial role. AI approaches have recently revolutionised this field by accelerating the discovery of new peptides with anti-infective activity, particularly in preclinical mouse models. Initially, classical ML approaches dominated the field, but recently there has been a shift towards deep learning (DL) models. Despite significant contributions, existing reviews have not thoroughly explored the potential of large language models (LLMs), graph neural networks (GNNs) and structure-guided AMP discovery and design. This review aims to fill that gap by providing a comprehensive overview of the latest advancements, challenges and opportunities in using AI methods, with a particular emphasis on LLMs, GNNs and structure-guided design. We discuss the limitations of current approaches and highlight the most relevant topics to address in the coming years for AMP discovery and design.
Collapse
Affiliation(s)
| | - Gary Liu
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic DiscoveryMcMaster UniversityHamiltonOntarioCanada
| | - Jonathan M. Stokes
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic DiscoveryMcMaster UniversityHamiltonOntarioCanada
| | - Cesar de la Fuente‐Nunez
- Machine Biology Group, Department of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied ScienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Chemistry, School of Arts and SciencesUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Penn Institute for Computational ScienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
8
|
Dhoriyani J, Bergman MT, Hall CK, You F. Integrating biophysical modeling, quantum computing, and AI to discover plastic-binding peptides that combat microplastic pollution. PNAS NEXUS 2025; 4:pgae572. [PMID: 39871828 PMCID: PMC11770337 DOI: 10.1093/pnasnexus/pgae572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 12/16/2024] [Indexed: 01/29/2025]
Abstract
Methods are needed to mitigate microplastic (MP) pollution to minimize their harm to the environment and human health. Given the ability of polypeptides to adsorb strongly to materials of micro- or nanometer size, plastic-binding peptides (PBPs) could help create bio-based tools for detecting, filtering, or degrading MNP pollution. However, the development of such tools is prevented by the lack of PBPs. In this work, we discover and evaluate PBPs for several common plastics by combining biophysical modeling, molecular dynamics (MD), quantum computing, and reinforcement learning. We frame peptide affinity for a given plastic through a Potts model that is a function of the amino acid sequence and then search for the amino acid sequences with the greatest predicted affinity using quantum annealing. We also use proximal policy optimization to find PBPs with a broader range of physicochemical properties, such as isoelectric point or solubility. Evaluation of the discovered PBPs in MD simulations demonstrates that the peptides have high affinity for two of the plastics: polyethylene and polypropylene. We conclude by describing how our computational approach could be paired with experimental approaches to create a nexus for designing and optimizing peptide-based tools that aid the detection, capture, or biodegradation of MPs. We thus hope that this study will aid in the fight against MP pollution.
Collapse
Affiliation(s)
- Jeet Dhoriyani
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Michael T Bergman
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Carol K Hall
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27606, USA
| | - Fengqi You
- Systems Engineering, College of Engineering, Cornell University, Ithaca, NY 14853, USA
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
- Cornell University AI for Science Institute, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
9
|
Mollaei P, Sadasivam D, Guntuboina C, Barati Farimani A. IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models. J Phys Chem B 2024; 128:12030-12037. [PMID: 39586094 DOI: 10.1021/acs.jpcb.4c02507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Danush Sadasivam
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
10
|
Feller AL, Wilke CO. Peptide-aware chemical language model successfully predicts membrane diffusion of cyclic peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607221. [PMID: 39149303 PMCID: PMC11326283 DOI: 10.1101/2024.08.09.607221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Language modeling applied to biological data has significantly advanced the prediction of membrane penetration for small molecule drugs and natural peptides. However, accurately predicting membrane diffusion for peptides with pharmacologically relevant modifications remains a substantial challenge. Here, we introduce PeptideCLM, a peptide-focused chemical language model capable of encoding peptides with chemical modifications, unnatural or non-canonical amino acids, and cyclizations. We assess this model by predicting membrane diffusion of cyclic peptides, demonstrating greater predictive power than existing chemical language models. Our model is versatile and can be extended beyond membrane diffusion predictions to other target values. Its advantages include the ability to model macromolecules using chemical string notation, a largely unexplored domain, and a simple, flexible architecture that allows for adaptation to any peptide or other macromolecule dataset.
Collapse
Affiliation(s)
- Aaron L Feller
- Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, Texas, USA 78712
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, USA 78712
- Interdisciplinary Life Sciences, The University of Texas at Austin, Austin, Texas, USA 78712
| |
Collapse
|
11
|
Tripathy A, Patne AY, Mohapatra S, Mohapatra SS. Convergence of Nanotechnology and Machine Learning: The State of the Art, Challenges, and Perspectives. Int J Mol Sci 2024; 25:12368. [PMID: 39596433 PMCID: PMC11594285 DOI: 10.3390/ijms252212368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 11/10/2024] [Accepted: 11/13/2024] [Indexed: 11/28/2024] Open
Abstract
Nanotechnology and machine learning (ML) are rapidly emerging fields with numerous real-world applications in medicine, materials science, computer engineering, and data processing. ML enhances nanotechnology by facilitating the processing of dataset in nanomaterial synthesis, characterization, and optimization of nanoscale properties. Conversely, nanotechnology improves the speed and efficiency of computing power, which is crucial for ML algorithms. Although the capabilities of nanotechnology and ML are still in their infancy, a review of the research literature provides insights into the exciting frontiers of these fields and suggests that their integration can be transformative. Future research directions include developing tools for manipulating nanomaterials and ensuring ethical and unbiased data collection for ML models. This review emphasizes the importance of the coevolution of these technologies and their mutual reinforcement to advance scientific and societal goals.
Collapse
Affiliation(s)
- Arnav Tripathy
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
| | - Akshata Y. Patne
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Graduate Programs, Taneja College of Pharmacy, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
| | - Subhra Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Shyam S. Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (A.T.); (A.Y.P.)
- Graduate Programs, Taneja College of Pharmacy, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| |
Collapse
|
12
|
Yu Q, Zhang Z, Liu G, Li W, Tang Y. ToxGIN: an In silico prediction model for peptide toxicity via graph isomorphism networks integrating peptide sequence and structure information. Brief Bioinform 2024; 25:bbae583. [PMID: 39530430 PMCID: PMC11555482 DOI: 10.1093/bib/bbae583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/22/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Peptide drugs have demonstrated enormous potential in treating a variety of diseases, yet toxicity prediction remains a significant challenge in drug development. Existing models for prediction of peptide toxicity largely rely on sequence information and often neglect the three-dimensional (3D) structures of peptides. This study introduced a novel model for short peptide toxicity prediction, named ToxGIN. The model utilizes Graph Isomorphism Network (GIN), integrating the underlying amino acid sequence composition and the 3D structures of peptides. ToxGIN comprises three primary modules: (i) Sequence processing module, converting peptide 3D structures and sequences into information of nodes and edges; (ii) Feature extraction module, utilizing GIN to learn discriminative features from nodes and edges; (iii) Classification module, employing a fully connected classifier for toxicity prediction. ToxGIN performed well on the independent test set with F1 score = 0.83, AUROC = 0.91, and Matthews correlation coefficient = 0.68, better than existing models for prediction of peptide toxicity. These results validated the effectiveness of integrating 3D structural information with sequence data using GIN for peptide toxicity prediction. The proposed ToxGIN and data can be freely accessible at https://github.com/cihebiyql/ToxGIN.
Collapse
Affiliation(s)
- Qiule Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhixing Zhang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
13
|
Ock J, Mollaei P, Barati Farimani A. GradNav: Accelerated Exploration of Potential Energy Surfaces with Gradient-Based Navigation. J Chem Theory Comput 2024; 20:4088-4098. [PMID: 38728667 PMCID: PMC11137815 DOI: 10.1021/acs.jctc.4c00316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/23/2024] [Accepted: 04/25/2024] [Indexed: 05/12/2024]
Abstract
Exploring the potential energy surface (PES) of molecular systems is important for comprehending their complex behaviors, particularly through the identification of various metastable states. However, the transition between these states is often hindered by substantial energy barriers, demanding prolonged molecular simulations that consume considerable computational resources. Our study introduces the gradient-based navigation (GradNav) algorithm, which accelerates the exploration of the energy surface and enables proper reconstruction of the PES. This algorithm employs a strategy of initiating short simulation runs from updated starting points derived from prior observations to effectively navigate across potential barriers and explore new regions. To evaluate GradNav's performance, we introduce two metrics: the deepest well escape frame (DWEF) and the search success initialization ratio (SSIR). Through applications on Langevin dynamics within Müller-type PESs and molecular dynamics simulations of the Fs-peptide protein, these metrics demonstrate GradNav's enhanced ability to escape deep energy wells and its reduced reliance on initial conditions, as denoted by the reduced DWEF values and increased SSIR values, respectively. Consequently, this improved exploration capability enables more precise energy estimations from simulation trajectories.
Collapse
Affiliation(s)
- Janghoon Ock
- Department
of Chemical Engineering, Carnegie Mellon
University, 5000 Forbes Street, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes
Street, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes
Street, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
14
|
Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics 2024; 40:btae297. [PMID: 38696758 PMCID: PMC11654579 DOI: 10.1093/bioinformatics/btae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/02/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.
Collapse
Affiliation(s)
- Shihu Jiao
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic
Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science
and Technology of China, Quzhou 324000, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191,
China
| |
Collapse
|
15
|
Kim S, Mollaei P, Antony A, Magar R, Barati Farimani A. GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models. J Chem Inf Model 2024; 64:1134-1144. [PMID: 38340054 PMCID: PMC10900288 DOI: 10.1021/acs.jcim.3c01706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 01/29/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024]
Abstract
With the rise of transformers and large language models (LLMs) in chemistry and biology, new avenues for the design and understanding of therapeutics have been opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence data sets. In this letter, we developed the GPCR-BERT model for understanding the sequential design of G protein-coupled receptors (GPCRs). GPCRs are the target of over one-third of Food and Drug Administration-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship among amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, and E/DRY). By utilizing the pretrained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
Collapse
Affiliation(s)
- Seongwon Kim
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Akshay Antony
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Rishikesh Magar
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|