1
|
Dosajh A, Agrawal P, Chatterjee P, Priyakumar UD. Modern machine learning methods for protein property prediction. Curr Opin Struct Biol 2025; 90:102990. [PMID: 39881454 DOI: 10.1016/j.sbi.2025.102990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 12/06/2024] [Accepted: 01/04/2025] [Indexed: 01/31/2025]
Abstract
Recent progress and development of artificial intelligence and machine learning (AI/ML) techniques have enabled addressing complex biomolecular problems. AI/ML models learn the underlying distribution of data they are trained on and when exposed to new inputs, they make predictions based on patterns and relationships previously observed in the training set. Further, generative artificial intelligence (GenAI) can be used to accurately generate protein structure or sequence from specific selected properties. This review specifically focuses on the applications of AI/ML in predicting important functional properties of proteins, and the potential prospects of reverse-engineering in depicting the sequence and structure, from available protein-property information.
Collapse
Affiliation(s)
- Arjun Dosajh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prakul Agrawal
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prathit Chatterjee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India.
| |
Collapse
|
2
|
Pimtawong T, Ren J, Lee J, Lee HM, Na D. A review on computational models for predicting protein solubility. J Microbiol 2025; 63:e.2408001. [PMID: 39895070 DOI: 10.71150/jm.2408001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 10/29/2024] [Indexed: 02/04/2025]
Abstract
Protein solubility is a critical factor in the production of recombinant proteins, which are widely used in various industries, including pharmaceuticals, diagnostics, and biotechnology. Predicting protein solubility remains a challenging task due to the complexity of protein structures and the multitude of factors influencing solubility. Recent advances in computational methods, particularly those based on machine learning, have provided powerful tools for predicting protein solubility, thereby reducing the need for extensive experimental trials. This review provides an overview of current computational approaches to predict protein solubility. We discuss the datasets, features, and algorithms employed in these models. The review aims to bridge the gap between computational predictions and experimental validations, fostering the development of more accurate and reliable solubility prediction models that can significantly enhance recombinant protein production.
Collapse
Affiliation(s)
- Teerapat Pimtawong
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Jun Ren
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Hyang-Mi Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
3
|
Sah SN, Gupta S, Bhardwaj N, Gautam LK, Capalash N, Sharma P. In silico design and assessment of a multi-epitope peptide vaccine against multidrug-resistant Acinetobacter baumannii. In Silico Pharmacol 2024; 13:7. [PMID: 39726905 PMCID: PMC11668725 DOI: 10.1007/s40203-024-00292-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 12/09/2024] [Indexed: 12/28/2024] Open
Abstract
Acinetobacter baumannii, an opportunistic and notorious nosocomial pathogen, is responsible for many infections affecting soft tissues, skin, lungs, bloodstream, and urinary tract, accounting for more than 722,000 cases annually. Despite the numerous advancements in therapeutic options, no approved vaccine is currently available for this particular bacterium. Consequently, this study focused on creating a rational vaccine design using bioinformatics tools. Three outer membrane proteins with immunogenic potential and properties of good vaccine candidates were used to select epitopes based on good antigenic properties, non-allergenicity, high binding scores, and a low IC50 value. A multi-epitope peptide (MEP) construct was created by sequentially linking the epitopes using suitable linkers. ClusPro 2.0 and C-ImmSim web servers were used for docking analysis with TLR2/TLR4 and immune response respectively. The Ramachandran plot showed an accurate model of the MEP with 100% residue in the most favored and allowed regions. The construct was highly antigenic, stable, non-allergenic, non-toxic, and soluble, and showed maximum population coverage. Additionally, molecular docking demonstrated strong binding between the designed MEP vaccine and TLR2/TLR4. In silico immunological simulations showed significant increases in T-cell and B-cell populations. Finally, codon optimization and in silico cloning were conducted using the pET-28a (+) plasmid vector to evaluate the efficiency of the expression of vaccine peptide in the host organism (Escherichia coli). This designed MEP vaccine would support and accelerate the laboratory work to develop a potent vaccine targeting MDR Acinetobacter baumannii. Supplementary Information The online version contains supplementary material available at 10.1007/s40203-024-00292-3.
Collapse
Affiliation(s)
- Shiv Nandan Sah
- Department of Microbiology, Panjab University, Chandigarh, 160014 India
- Department of Microbiology, Central Campus of Technology, Tribhuvan University, Dharan, Nepal
| | - Sumit Gupta
- School of Pharmaceutical Education and Research, Jamia Hamdard University, New Delhi, 110062 India
| | - Neha Bhardwaj
- Department of Microbiology, Panjab University, Chandigarh, 160014 India
| | - Lalit Kumar Gautam
- Department of Anatomy and Cell Biology, University of Iowa, Iowa City, IA 52242 USA
- Department of Biotechnology, Panjab University, Chandigarh, 160014 India
| | - Neena Capalash
- Department of Biotechnology, Panjab University, Chandigarh, 160014 India
| | - Prince Sharma
- Department of Microbiology, Panjab University, Chandigarh, 160014 India
| |
Collapse
|
4
|
Hu RE, Yu CH, Ng IS. GRACE: Generative Redesign in Artificial Computational Enzymology. ACS Synth Biol 2024; 13:4154-4164. [PMID: 39513550 DOI: 10.1021/acssynbio.4c00624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
Designing de novo enzymes is complex and challenging, especially to maintain the activity. This research focused on motif design to identify the crucial domain in the enzyme and uncovered the protein structure by molecular docking. Therefore, we developed a Generative Redesign in Artificial Computational Enzymology (GRACE), which is an automated workflow for reformation and creation of the de novo enzymes for the first time. GRACE integrated RFdiffusion for structure generation, ProteinMPNN for sequence interpretation, CLEAN for enzyme classification, and followed by solubility analysis and molecular dynamic simulation. As a result, we selected two gene sequences associated with carbonic anhydrase from among 10,000 protein candidates. Experimental validation confirmed that these two novel enzymes, i.e., dCA12_2 and dCA23_1, exhibited favorable solubility, promising substrate-active site interactions, and achieved activity of 400 WAU/mL. This workflow has the potential to greatly streamline experimental efforts in enzyme engineering and unlock new avenues for rational protein design.
Collapse
Affiliation(s)
- Ruei-En Hu
- Department of Chemical Engineering, National Cheng Kung University, Tainan City 701, Taiwan
| | - Chi-Hua Yu
- Department of Engineering Science, National Cheng Kung University, Tainan City 701, Taiwan
| | - I-Son Ng
- Department of Chemical Engineering, National Cheng Kung University, Tainan City 701, Taiwan
| |
Collapse
|
5
|
Song BPC, Lai JY, Choong YS, Khanbabaei N, Latz A, Lim TS. Isolation of anti-Ancylostoma-secreted protein 5 (ASP5) antibody from a naïve antibody phage library. J Immunol Methods 2024; 535:113776. [PMID: 39551437 DOI: 10.1016/j.jim.2024.113776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 11/06/2024] [Accepted: 11/11/2024] [Indexed: 11/19/2024]
Abstract
Ancylostoma species are parasitic nematodes that release a multitude of proteins to manipulate host immune responses to facilitate their survival. Among the released proteins, Ancylostoma-secreted protein 5 (ASP5) plays a pivotal role in mediating host-parasite interactions, making it a promising target for interventions against canine hookworm infections caused by Ancylostoma species. Antibody phage display, a widely used method for generating human monoclonal antibodies was employed in this study. A bacterial expression system was used to produce ASP5 for biopanning. A single-chain fragment variable (scFv) monoclonal antibody against ASP5 was generated from the naïve Human AntibodY LibrarY (HAYLY). The resulting scFv antibody was characterized to elucidate its antigen-binding properties. The identified monoclonal antibody showed good specificity and binding characteristics which highlights its potential for diagnostic applications for hookworm infections.
Collapse
Affiliation(s)
- Brenda Pei Chui Song
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, 11800 Penang, Malaysia
| | - Jing Yi Lai
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, 11800 Penang, Malaysia
| | - Yee Siew Choong
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, 11800 Penang, Malaysia
| | | | - Andreas Latz
- Gold Standard Diagnostics Frankfurt GmbH, Dietzenbach, Germany
| | - Theam Soon Lim
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, 11800 Penang, Malaysia; Analytical Biochemistry Research Centre, Universiti Sains Malaysia, 11800 Penang, Malaysia.
| |
Collapse
|
6
|
Wang C, Zou Q. MFPSP: Identification of fungal species-specific phosphorylation site using offspring competition-based genetic algorithm. PLoS Comput Biol 2024; 20:e1012607. [PMID: 39556608 DOI: 10.1371/journal.pcbi.1012607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 11/03/2024] [Indexed: 11/20/2024] Open
Abstract
Protein phosphorylation is essential in various signal transduction and cellular processes. To date, most tools are designed for model organisms, but only a handful of methods are suitable for predicting task in fungal species, and their performance still leaves much to be desired. In this study, a novel tool called MFPSP is developed for phosphorylation site prediction in multi-fungal species. The amino acids sequence features were derived from physicochemical and distributed information, and an offspring competition-based genetic algorithm was applied for choosing the most effective feature subset. The comparison results shown that MFPSP achieves a more advanced and balanced performance to several state-of-the-art available toolkits. Feature contribution and interaction exploration indicating the proposed model is efficient in uncovering concealed patterns within sequence. We anticipate MFPSP to serve as a valuable bioinformatics tool and benefiting practical experiments by pre-screening potential phosphorylation sites and enhancing our functional understanding of phosphorylation modifications in fungi. The source code and datasets are accessible at https://github.com/AI4HKB/MFPSP/.
Collapse
Affiliation(s)
- Chao Wang
- Center for Genomic and Personalized Medicine, Guangxi key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
7
|
Smorodina E, Tao F, Qing R, Yang S, Zhang S. Computational engineering of water-soluble human potassium ion channels through QTY transformation. Sci Rep 2024; 14:28159. [PMID: 39548172 PMCID: PMC11568286 DOI: 10.1038/s41598-024-76603-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/14/2024] [Indexed: 11/17/2024] Open
Abstract
Transmembrane potassium ion channels are crucial for ion transport, metabolism, and signaling, and serve as promising targets for anti-cancer therapies. However, their hydrophobic transmembrane nature requires detergents, posing a major bottleneck for experimental handling. In this paper, we present a structural bioinformatics study of six experimentally determined and twelve modeled potassium channel structures, in which hydrophobic amino acids (L, I/V, and F) were systematically replaced with neutral hydrophilic ones (Q, T, and Y), making the proteins more water-soluble. QTY (computationally predicted) and native (experimental and repredicted) variants show remarkable structural similarity (RMSD: ~0.50 Å - ~2.14 Å) despite significant sequence differences. QTY variants, both rigid and refined with MD simulations, maintain comparable to native variants stability, solvent-accessible surface area (SASA), and ionic, aromatic, and van der Waals interactions but differ in the grand average of hydropathy (GRAVY), solubility, and hydrophobic contacts. Overall, our study presents a computational approach for designing hydrophilic potassium ion channels while maintaining the native global structure that could potentially simplify their practical use by eliminating the need for detergents.
Collapse
Affiliation(s)
- Eva Smorodina
- Laboratory for Computational and Systems Immunology, Department of Immunology, University of Oslo, Oslo University Hospital, Oslo, Norway
| | - Fei Tao
- Laboratory of Food Microbial Technology, State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Rui Qing
- Laboratory of Food Microbial Technology, State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Steve Yang
- PT Metiska Farma, Daerah Khusus Ibukota, Jakarta, 12220, Indonesia
| | - Shuguang Zhang
- Laboratory of Molecular Architecture, Media Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA.
| |
Collapse
|
8
|
Yoodee S, Peerapen P, Thongboonkerd V. Defining physicochemical properties of urinary proteins that determine their inhibitory activities against calcium oxalate kidney stone formation. Int J Biol Macromol 2024; 279:135242. [PMID: 39218173 DOI: 10.1016/j.ijbiomac.2024.135242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024]
Abstract
We have recently reported a set of urinary proteins that inhibited calcium oxalate (CaOx) stone development. However, physicochemical properties that determine their inhibitory activities remained unknown. Herein, human urinary proteins were chromatographically fractionated into 15 fractions and subjected to various CaOx crystal assays and identification by nanoLC-ESI-Qq-TOF MS/MS. Their physicochemical properties and crystal inhibitory activities were subjected to Pearson correlation analysis. The data showed that almost all urinary protein fractions had crystal inhibitory activities. Up to 128 proteins were identified from each fraction. Crystallization inhibitory activity correlated with percentages of Ca2+-binding proteins, stable proteins, polar amino acids, alpha helix, beta turn, and random coil, but inversely correlated with number of Ox2--binding motifs/protein and percentage of unstable proteins. Crystal aggregation inhibitory activity correlated with percentage of stable proteins but inversely correlated with percentage of unstable proteins. Crystal adhesion inhibitory activity correlated with percentage of stable proteins and GRAVY, but inversely correlated with pI, instability index and percentages of unstable proteins and positively charged amino acids. However, there was no correlation between crystal growth inhibitory activity and any physicochemical properties. In summary, some physicochemical properties of urinary proteins can determine and may be able to predict their CaOx stone inhibitory activities.
Collapse
Affiliation(s)
- Sunisa Yoodee
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Paleerath Peerapen
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
9
|
Li D, Zhu Y, Zhang W, Liu J, Yang X, Liu Z, Wei D. AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. Interdiscip Sci 2024:10.1007/s12539-024-00662-7. [PMID: 39367992 DOI: 10.1007/s12539-024-00662-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/18/2024] [Accepted: 09/22/2024] [Indexed: 10/07/2024]
Abstract
The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.
Collapse
Affiliation(s)
- Daixi Li
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China.
- Pengcheng Laboratory, Shenzhen, 518055, China.
| | - Yuqi Zhu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Wujie Zhang
- Chemical and Biomolecular Engineering Program, Physics and Chemistry Department, Milwaukee School of Engineering, Milwaukee, 53202, USA
| | - Jing Liu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Xiaochen Yang
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Zhihong Liu
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Dongqing Wei
- Pengcheng Laboratory, Shenzhen, 518055, China
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation, Center On Antibacterial Resistances, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
10
|
Ghafoor H, Asim MN, Ibrahim MA, Dengel A. ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution. Heliyon 2024; 10:e36041. [PMID: 39281576 PMCID: PMC11401092 DOI: 10.1016/j.heliyon.2024.e36041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 08/08/2024] [Indexed: 09/18/2024] Open
Abstract
Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
11
|
Tan Y, Li M, Zhou B, Zhong B, Zheng L, Tan P, Zhou Z, Yu H, Fan G, Hong L. Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models. J Chem Inf Model 2024; 64:6338-6349. [PMID: 39110130 DOI: 10.1021/acs.jcim.4c00689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Fine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is nontrivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark data sets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated convergence speed by a maximum of 1034% and an average of 362%, the training efficiency is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter.
Collapse
Affiliation(s)
- Yang Tan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
| | - Mingchen Li
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
| | - Bingxin Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bozitao Zhong
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Cell and Developmental Biology & Michigan Neuroscience Institute, University of Michigan Medical School, Ann Arbor, Michigan 48104, United States
| | - Pan Tan
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ziyi Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Huiqun Yu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liang Hong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
12
|
Zhang X, Hu X, Zhang T, Yang L, Liu C, Xu N, Wang H, Sun W. PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset. Brief Bioinform 2024; 25:bbae404. [PMID: 39179250 PMCID: PMC11343611 DOI: 10.1093/bib/bbae404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 07/19/2024] [Accepted: 08/07/2024] [Indexed: 08/26/2024] Open
Abstract
Protein solubility plays a crucial role in various biotechnological, industrial, and biomedical applications. With the reduction in sequencing and gene synthesis costs, the adoption of high-throughput experimental screening coupled with tailored bioinformatic prediction has witnessed a rapidly growing trend for the development of novel functional enzymes of interest (EOI). High protein solubility rates are essential in this process and accurate prediction of solubility is a challenging task. As deep learning technology continues to evolve, attention-based protein language models (PLMs) can extract intrinsic information from protein sequences to a greater extent. Leveraging these models along with the increasing availability of protein solubility data inferred from structural database like the Protein Data Bank holds great potential to enhance the prediction of protein solubility. In this study, we curated an Updated Escherichia coli protein Solubility DataSet (UESolDS) and employed a combination of multiple PLMs and classification layers to predict protein solubility. The resulting best-performing model, named Protein Language Model-based protein Solubility prediction model (PLM_Sol), demonstrated significant improvements over previous reported models, achieving a notable 6.4% increase in accuracy, 9.0% increase in F1_score, and 11.1% increase in Matthews correlation coefficient score on the independent test set. Moreover, additional evaluation utilizing our in-house synthesized protein resource as test data, encompassing diverse types of enzymes, also showcased the good performance of PLM_Sol. Overall, PLM_Sol exhibited consistent and promising performance across both independent test set and experimental set, thereby making it well suited for facilitating large-scale EOI studies. PLM_Sol is available as a standalone program and as an easy-to-use model at https://zenodo.org/doi/10.5281/zenodo.10675340.
Collapse
Affiliation(s)
- Xuechun Zhang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Xiaoxuan Hu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Tongtong Zhang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Ling Yang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Chunhong Liu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Ning Xu
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
| | - Haoyi Wang
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- University of Chinese Academy of Sciences, No. 1 Yanqihu East Rd, Huairou District, Beijing 101408, China
- Beijing Institute for Stem Cell and Regenerative Medicine, A 3 Datun Road, Chaoyang District, Beijing 100100, China
| | - Wen Sun
- Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute for Stem Cell and Regenerative Medicine, A 3 Datun Road, Chaoyang District, Beijing 100100, China
| |
Collapse
|
13
|
Yasamut U, Thongheang K, Weechan A, Sornsuwan K, Juntit OA, Tayapiwatana C. Evaluating the ability of different chaperones in improving soluble expression of a triple-mutated human interferon gamma in Escherichia coli. J Biosci Bioeng 2024:S1389-1723(24)00168-3. [PMID: 38969548 DOI: 10.1016/j.jbiosc.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 07/07/2024]
Abstract
Human interferon gamma (hIFN-γ) plays a pivotal role as a soluble cytokine with diverse functions in both innate and adaptive immunity. In a previous investigation, we pinpointed three critical amino acid residues, i.e., threonine (T) 27, phenylalanine (F) 29, and leucine (L) 30, on the IFN-γ structure, which are integral to the epitope recognized by anti-IFN-γ autoantibodies. It is crucial to impede the interaction between this epitope and autoantibodies for effective therapy in adult-onset immunodeficiency (AOID). However, the challenge arises from the diminished solubility of the T27AF29L30A mutant in Escherichia coli BL21(DE3). This study delves into a targeted strategy aimed at improving the soluble expression of IFN-γ T27AF29AL30A. This is achieved through the utilization of five chaperone plasmids: pG-KJE8, pKJE7, pGro7, pG-Tf2, and pTf16. These plasmids, encoding cytoplasmic chaperones, are co-expressed with the IFN-γ mutant in E. coli BL21(DE3), and we meticulously analyze the proteins in cell lysate and inclusion bodies using SDS-PAGE and Western blotting. Our findings reveal the remarkable efficacy of pG-KJE8, which houses cytoplasmic chaperones DnaK-DnaJ-GrpE and GroEL-GroES, in significantly enhancing the solubility of IFN-γ T27AF29AL30A. Importantly, this co-expression not only addresses solubility concerns but also preserves the functional dimerized structure, as confirmed by sandwich ELISA. This promising outcome signifies a significant step forward in developing biologic strategies for AOID.
Collapse
Affiliation(s)
- Umpa Yasamut
- Division of Clinical Immunology, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand; Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand; Center of Innovative Immunodiagnostic Development, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
| | - Kanyarat Thongheang
- Division of Clinical Immunology, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand; Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
| | - Anuwat Weechan
- Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
| | - Kanokporn Sornsuwan
- Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
| | - On-Anong Juntit
- Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
| | - Chatchai Tayapiwatana
- Division of Clinical Immunology, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand; Center of Biomolecular Therapy and Diagnostic, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand; Center of Innovative Immunodiagnostic Development, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand.
| |
Collapse
|
14
|
Manning MC, Holcomb RE, Payne RW, Stillahn JM, Connolly BD, Katayama DS, Liu H, Matsuura JE, Murphy BM, Henry CS, Crommelin DJA. Stability of Protein Pharmaceuticals: Recent Advances. Pharm Res 2024; 41:1301-1367. [PMID: 38937372 DOI: 10.1007/s11095-024-03726-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/03/2024] [Indexed: 06/29/2024]
Abstract
There have been significant advances in the formulation and stabilization of proteins in the liquid state over the past years since our previous review. Our mechanistic understanding of protein-excipient interactions has increased, allowing one to develop formulations in a more rational fashion. The field has moved towards more complex and challenging formulations, such as high concentration formulations to allow for subcutaneous administration and co-formulation. While much of the published work has focused on mAbs, the principles appear to apply to any therapeutic protein, although mAbs clearly have some distinctive features. In this review, we first discuss chemical degradation reactions. This is followed by a section on physical instability issues. Then, more specific topics are addressed: instability induced by interactions with interfaces, predictive methods for physical stability and interplay between chemical and physical instability. The final parts are devoted to discussions how all the above impacts (co-)formulation strategies, in particular for high protein concentration solutions.'
Collapse
Affiliation(s)
- Mark Cornell Manning
- Legacy BioDesign LLC, Johnstown, CO, USA.
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA.
| | - Ryan E Holcomb
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Robert W Payne
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | - Joshua M Stillahn
- Legacy BioDesign LLC, Johnstown, CO, USA
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | | | | | | | | | | - Charles S Henry
- Department of Chemistry, Colorado State University, Fort Collins, CO, USA
| | | |
Collapse
|
15
|
Pham NT, Terrance AT, Jeon YJ, Rakkiyappan R, Manavalan B. ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102192. [PMID: 38779332 PMCID: PMC11108997 DOI: 10.1016/j.omtn.2024.102192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, a bioinformatics tool that precisely identifies ac4C sites from primary RNA sequences. In ac4C-AFL, we identified the optimal sequence length for model building and implemented an adaptive feature representation strategy that is capable of extracting the most representative features from RNA. To identify the most relevant features, we proposed a novel ensemble feature importance scoring strategy to rank features effectively. We then used this information to conduct the sequential forward search, which individually determine the optimal feature set from the 16 sequence-derived feature descriptors. Utilizing these optimal feature descriptors, we constructed 176 baseline models using 11 popular classifiers. The most efficient baseline models were identified using the two-step feature selection approach, whose predicted scores were integrated and trained with the appropriate classifier to develop the final prediction model. Our rigorous cross-validations and independent tests demonstrate that ac4C-AFL surpasses contemporary tools in predicting ac4C sites. Moreover, we have developed a publicly accessible web server at https://balalab-skku.org/ac4C-AFL/.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Annie Terrina Terrance
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, Tamil Nadu 641046, India
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| |
Collapse
|
16
|
Li B, Ming D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinformatics 2024; 25:204. [PMID: 38824535 PMCID: PMC11549816 DOI: 10.1186/s12859-024-05820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/29/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .
Collapse
Affiliation(s)
- Bin Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China.
| |
Collapse
|
17
|
Yang Q, Jin X, Zhou H, Ying J, Zou J, Liao Y, Lu X, Ge S, Yu H, Min X. SurfPro-NN: A 3D point cloud neural network for the scoring of protein-protein docking models based on surfaces features and protein language models. Comput Biol Chem 2024; 110:108067. [PMID: 38714420 DOI: 10.1016/j.compbiolchem.2024.108067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/18/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Protein-protein interactions (PPI) play a crucial role in numerous key biological processes, and the structure of protein complexes provides valuable clues for in-depth exploration of molecular-level biological processes. Protein-protein docking technology is widely used to simulate the spatial structure of proteins. However, there are still challenges in selecting candidate decoys that closely resemble the native structure from protein-protein docking simulations. In this study, we introduce a docking evaluation method based on three-dimensional point cloud neural networks named SurfPro-NN, which represents protein structures as point clouds and learns interaction information from protein interfaces by applying a point cloud neural network. With the continuous advancement of deep learning in the field of biology, a series of knowledge-rich pre-trained models have emerged. We incorporate protein surface representation models and language models into our approach, greatly enhancing feature representation capabilities and achieving superior performance in protein docking model scoring tasks. Through comprehensive testing on public datasets, we find that our method outperforms state-of-the-art deep learning approaches in protein-protein docking model scoring. Not only does it significantly improve performance, but it also greatly accelerates training speed. This study demonstrates the potential of our approach in addressing protein interaction assessment problems, providing strong support for future research and applications in the field of biology.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaocheng Jin
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Haixia Zhou
- School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Junjie Ying
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - JiaJun Zou
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Yiyang Liao
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Xiaoli Lu
- Information and Networking Center, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Hai Yu
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaoping Min
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| |
Collapse
|
18
|
Bernhardt GV, Bernhardt K, Shivappa P, Pinto JRT. Immunoinformatic prediction to identify Staphylococcus aureus peptides that bind to CD8+ T-cells as potential vaccine candidates. Vet World 2024; 17:1413-1422. [PMID: 39077442 PMCID: PMC11283606 DOI: 10.14202/vetworld.2024.1413-1422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/04/2024] [Indexed: 07/31/2024] Open
Abstract
Background and Aim Staphylococcus aureus, with its diverse virulence factors and immune response evasion mechanisms, presents a formidable challenge as an opportunistic pathogen. Developing an effective vaccine against S. aureus has proven elusive despite extensive efforts. Autologous Staphylococcus lysate (ASL) treatment has proven effective in triggering an immune response against bovine mastitis. Peptides that stimulate the immune response can be the subject of further research. The study aimed to use immunoinformatics tools to identify epitopes on S. aureus surface and secretory proteins that can bind to major histocompatibility complex class I (MHC I) and CD8+ T-cells. This method aids in discovering prospective vaccine candidates and elucidating the rationale behind ASL therapy's efficacy. Materials and Methods Proteins were identified using both literature search and the National Center for Biotechnology Information search engine Entrez. Self and non-self peptides, allergenicity predictions, epitope locations, and physicochemical characteristics were determined using sequence alignment, AllerTOP, SVMTriP, and Protein-Sol tools. Hex was employed for simulating the docking interactions between S. aureus proteins and the MHC I + CD8+ T-cells complex. The binding sites of S. aureus proteins were assessed using Computer Atlas of Surface Topography of Proteins (CASTp) while docked with MHC I and CD8+ T-cells. Results Nine potential S. aureus peptides and their corresponding epitopes were identified in this study, stimulating cytotoxic T-cell mediated immunity. The peptides were analyzed for similarity with self-antigens and allergenicity. 1d20, 2noj, 1n67, 1nu7, 1amx, and 2b71, non-self and stable, are potential elicitors of the cytotoxic T-cell response. The energy values from docking simulations of peptide-MHC I complexes with the CD8+ and T-cell receptor (TCR) indicate the stability and strength of the formed complexes. These peptides - 2noj, 1d20, 1n67, 2b71, 1nu7, 1yn3, 1amx, 2gi9, and 1edk - demonstrated robust MHC I binding, as evidenced by their low binding energies. Peptide 2gi9 exhibited the lowest energy value, followed by 2noj, 1nu7, 1n67, and 1d20, when docked with MHC I and CD8 + TCR, suggesting a highly stable complex. CASTp analysis indicated substantial binding pockets in the docked complexes, with peptide 1d20 showing the highest values for area and volume, suggesting its potential as an effective elicitor of immunological responses. These peptides - 2noj, 2gi9, 1d20, and 1n67 - stand out for vaccine development and T-cell activation against S. aureus. Conclusion This study sheds light on the design and development of S. aureus vaccines, highlighting the significance of employing computational methods in conjunction with experimental verification. The significance of T-cell responses in combating S. aureus infections is emphasized by this study. More experiments are needed to confirm the effectiveness of these vaccine candidates and discover their possible medical uses.
Collapse
Affiliation(s)
- Grisilda Vidya Bernhardt
- Department of Biochemistry, RAK College of Medical Sciences, RAK Medical and Health Sciences University, Ras Al Khaimah, United Arab Emirates
| | - Kavitha Bernhardt
- Department of Basic Medical Sciences, Division of Physiology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Pooja Shivappa
- Department of Biochemistry, RAK College of Medical Sciences, RAK Medical and Health Sciences University, Ras Al Khaimah, United Arab Emirates
| | - Janita Rita Trinita Pinto
- Department of Biomedical Sciences, College of Medicine, Gulf Medical University, Ajman, United Arab Emirates
| |
Collapse
|
19
|
Ma B, Chen H, Gong J, Liu W, Wei X, Zhang Y, Li X, Li M, Wang Y, Shang S, Tian B, Li Y, Wang R, Tan Z. Enhancing Protein Solubility via Glycosylation: From Chemical Synthesis to Machine Learning Predictions. Biomacromolecules 2024; 25:3001-3010. [PMID: 38598264 DOI: 10.1021/acs.biomac.4c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Glycosylation is a valuable tool for modulating protein solubility; however, the lack of reliable research strategies has impeded efficient progress in understanding and applying this modification. This study aimed to bridge this gap by investigating the solubility of a model glycoprotein molecule, the carbohydrate-binding module (CBM), through a two-stage process. In the first stage, an approach involving chemical synthesis, comparative analysis, and molecular dynamics simulations of a library of glycoforms was employed to elucidate the effect of different glycosylation patterns on solubility and the key factors responsible for the effect. In the second stage, a predictive mathematical formula, innovatively harnessing machine learning algorithms, was derived to relate solubility to the identified key factors and accurately predict the solubility of the newly designed glycoforms. Demonstrating feasibility and effectiveness, this two-stage approach offers a valuable strategy for advancing glycosylation research, especially for the discovery of glycoforms with increased solubility.
Collapse
Affiliation(s)
- Bo Ma
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Hedi Chen
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Jinyuan Gong
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Wenqiang Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Xiuli Wei
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yajing Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Xin Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Meng Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Yani Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Shiying Shang
- Center of Pharmaceutical Technology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yaohao Li
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Ruihan Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- Chemical Engineering College, Hebei Normal University of Science and Technology, Qinhuangdao 066600, China
| | - Zhongping Tan
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
20
|
Zhang Z, Zhao L, Gao M, Chen Y, Wang J, Wang C. PPII-AEAT: Prediction of protein-protein interaction inhibitors based on autoencoders with adversarial training. Comput Biol Med 2024; 172:108287. [PMID: 38503089 DOI: 10.1016/j.compbiomed.2024.108287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/21/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Protein-protein interactions (PPIs) have shown increasing potential as novel drug targets. The design and development of small molecule inhibitors targeting specific PPIs are crucial for the prevention and treatment of related diseases. Accordingly, effective computational methods are highly desired to meet the emerging need for the large-scale accurate prediction of PPI inhibitors. However, existing machine learning models rely heavily on the manual screening of features and lack generalizability. Here, we propose a new PPI inhibitor prediction method based on autoencoders with adversarial training (named PPII-AEAT) that can adaptively learn molecule representation to cope with different PPI targets. First, Extended-connectivity fingerprints and Mordred descriptors are employed to extract the primary features of small molecular compounds. Then, an autoencoder architecture is trained in three phases to learn high-level representations and predict inhibitory scores. We evaluate PPII-AEAT on nine PPI targets and two different tasks, including the PPI inhibitor identification task and inhibitory potency prediction task. The experimental results show that our proposed PPII-AEAT outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Zitong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Mengyao Gao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Yuanlong Chen
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
21
|
Eskandari A, Nezhad NG, Leow TC, Rahman MBA, Oslan SN. Essential factors, advanced strategies, challenges, and approaches involved for efficient expression of recombinant proteins in Escherichia coli. Arch Microbiol 2024; 206:152. [PMID: 38472371 DOI: 10.1007/s00203-024-03871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 12/31/2023] [Accepted: 01/25/2024] [Indexed: 03/14/2024]
Abstract
Producing recombinant proteins is a major accomplishment of biotechnology in the past century. Heterologous hosts, either eukaryotic or prokaryotic, are used for the production of these proteins. The utilization of microbial host systems continues to dominate as the most efficient and affordable method for biotherapeutics and food industry productions. Hence, it is crucial to analyze the limitations and advantages of microbial hosts to enhance the efficient production of recombinant proteins on a large scale. E. coli is widely used as a host for the production of recombinant proteins. Researchers have identified certain obstacles with this host, and given the growing demand for recombinant protein production, there is an immediate requirement to enhance this host. The following review discusses the elements contributing to the manifestation of recombinant protein. Subsequently, it sheds light on innovative approaches aimed at improving the expression of recombinant protein. Lastly, it delves into the obstacles and optimization methods associated with translation, mentioning both cis-optimization and trans-optimization, producing soluble recombinant protein, and engineering the metal ion transportation. In this context, a comprehensive description of the distinct features will be provided, and this knowledge could potentially enhance the expression of recombinant proteins in E. coli.
Collapse
Affiliation(s)
- Azadeh Eskandari
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Biochemistry, FacultyofBiotechnologyand BiomolecularSciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Nima Ghahremani Nezhad
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | - Thean Chor Leow
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
- Enzyme Technology and X-Ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
| | | | - Siti Nurbaya Oslan
- Enzyme and Microbial Technology Research Centre, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Department of Biochemistry, FacultyofBiotechnologyand BiomolecularSciences, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
- Enzyme Technology and X-Ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia.
| |
Collapse
|
22
|
Demetriou D, Lockhat Z, Brzozowski L, Saini KS, Dlamini Z, Hull R. The Convergence of Radiology and Genomics: Advancing Breast Cancer Diagnosis with Radiogenomics. Cancers (Basel) 2024; 16:1076. [PMID: 38473432 DOI: 10.3390/cancers16051076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/09/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024] Open
Abstract
Despite significant progress in the prevention, screening, diagnosis, prognosis, and therapy of breast cancer (BC), it remains a highly prevalent and life-threatening disease affecting millions worldwide. Molecular subtyping of BC is crucial for predictive and prognostic purposes due to the diverse clinical behaviors observed across various types. The molecular heterogeneity of BC poses uncertainties in its impact on diagnosis, prognosis, and treatment. Numerous studies have highlighted genetic and environmental differences between patients from different geographic regions, emphasizing the need for localized research. International studies have revealed that patients with African heritage are often diagnosed at a more advanced stage and exhibit poorer responses to treatment and lower survival rates. Despite these global findings, there is a dearth of in-depth studies focusing on communities in the African region. Early diagnosis and timely treatment are paramount to improving survival rates. In this context, radiogenomics emerges as a promising field within precision medicine. By associating genetic patterns with image attributes or features, radiogenomics has the potential to significantly improve early detection, prognosis, and diagnosis. It can provide valuable insights into potential treatment options and predict the likelihood of survival, progression, and relapse. Radiogenomics allows for visual features and genetic marker linkage that promises to eliminate the need for biopsy and sequencing. The application of radiogenomics not only contributes to advancing precision oncology and individualized patient treatment but also streamlines clinical workflows. This review aims to delve into the theoretical underpinnings of radiogenomics and explore its practical applications in the diagnosis, management, and treatment of BC and to put radiogenomics on a path towards fully integrated diagnostics.
Collapse
Affiliation(s)
- Demetra Demetriou
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Zarina Lockhat
- Department of Radiology, Faculty of Health Sciences, Steve Biko Academic Hospital, University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Luke Brzozowski
- Translational Research and Core Facilities, University Health Network, Toronto, ON M5G 1L7, Canada
| | - Kamal S Saini
- Fortrea Inc., 8 Moore Drive, Durham, NC 27709, USA
- Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Zodwa Dlamini
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| | - Rodney Hull
- SAMRC Precision Oncology Research Unit (PORU), DSI/NRF SARChI Chair in Precision Oncology and Cancer Prevention (POCP), Pan African Cancer Research Institute (PACRI), University of Pretoria, Hatfield, Pretoria 0028, South Africa
| |
Collapse
|
23
|
Carter CW. Base Pairing Promoted the Self-Organization of Genetic Coding, Catalysis, and Free-Energy Transduction. Life (Basel) 2024; 14:199. [PMID: 38398709 PMCID: PMC10890426 DOI: 10.3390/life14020199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
How Nature discovered genetic coding is a largely ignored question, yet the answer is key to explaining the transition from biochemical building blocks to life. Other, related puzzles also fall inside the aegis enclosing the codes themselves. The peptide bond is unstable with respect to hydrolysis. So, it requires some form of chemical free energy to drive it. Amino acid activation and acyl transfer are also slow and must be catalyzed. All living things must thus also convert free energy and synchronize cellular chemistry. Most importantly, functional proteins occupy only small, isolated regions of sequence space. Nature evolved heritable symbolic data processing to seek out and use those sequences. That system has three parts: a memory of how amino acids behave in solution and inside proteins, a set of code keys to access that memory, and a scoring function. The code keys themselves are the genes for cognate pairs of tRNA and aminoacyl-tRNA synthetases, AARSs. The scoring function is the enzymatic specificity constant, kcat/kM, which measures both catalysis and specificity. The work described here deepens the evidence for and understanding of an unexpected consequence of ancestral bidirectional coding. Secondary structures occur in approximately the same places within antiparallel alignments of their gene products. However, the polar amino acids that define the molecular surface of one are reflected into core-defining non-polar side chains on the other. Proteins translated from base-paired coding strands fold up inside out. Bidirectional genes thus project an inverted structural duality into the proteome. I review how experimental data root the scoring functions responsible for the origins of coding and catalyzed activation of unfavorable chemical reactions in that duality.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, USA
| |
Collapse
|
24
|
Bamezai S, Maresca di Serracapriola G, Morris F, Hildebrandt R, Amil MAS, Ledesma‐Amaro R. Protein engineering in the computational age: An open source framework for exploring mutational landscapes in silico. ENGINEERING BIOLOGY 2023; 7:29-38. [PMID: 38094241 PMCID: PMC10715127 DOI: 10.1049/enb2.12028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 10/04/2023] [Accepted: 10/25/2023] [Indexed: 10/16/2024] Open
Abstract
The field of protein engineering has seen tremendous expansion in the last decade, with researchers developing novel proteins with specialised functionalities for a range of uses, from drug discovery to industrial biotechnology. The emergence of computational tools and high-throughput screening technology has substantially sped up the process of protein engineering. However, much of the expertise required to engage in such projects is still concentrated in the hands of a few specialised individuals, including computational biologists and structural biochemists. The international Genetically Engineered Machine (iGEM) competition represents a platform for undergraduate students to innovate in synthetic biology. Yet, due to their complexity, arduous protein engineering projects are hindered by the resources available and strict timelines of the competition. The authors highlight how the 2022 iGEM Team, 'Sporadicate', set out to develop InFinity 1.0, a computational framework for increased accessibility to effective protein engineering, hoping to increase awareness and accessibility to novel in silico tools.
Collapse
Affiliation(s)
- Shirin Bamezai
- Department of Bioengineering and Imperial College Centre for Synthetic BiologyImperial College LondonLondonUK
| | | | - Freya Morris
- Department of Bioengineering and Imperial College Centre for Synthetic BiologyImperial College LondonLondonUK
| | | | | | - Rodrigo Ledesma‐Amaro
- Department of Bioengineering and Imperial College Centre for Synthetic BiologyImperial College LondonLondonUK
| |
Collapse
|
25
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
26
|
Basith S, Pham NT, Song M, Lee G, Manavalan B. ADP-Fuse: A novel two-layer machine learning predictor to identify antidiabetic peptides and diabetes types using multiview information. Comput Biol Med 2023; 165:107386. [PMID: 37619323 DOI: 10.1016/j.compbiomed.2023.107386] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 08/26/2023]
Abstract
Diabetes mellitus has become a major public health concern associated with high mortality and reduced life expectancy and can cause blindness, heart attacks, kidney failure, lower limb amputations, and strokes. A new generation of antidiabetic peptides (ADPs) that act on β-cells or T-cells to regulate insulin production is being developed to alleviate the effects of diabetes. However, the lack of effective peptide-mining tools has hampered the discovery of these promising drugs. Hence, novel computational tools need to be developed urgently. In this study, we present ADP-Fuse, a novel two-layer prediction framework capable of accurately identifying ADPs or non-ADPs and categorizing them into type 1 and type 2 ADPs. First, we comprehensively evaluated 22 peptide sequence-derived features coupled with eight notable machine learning algorithms. Subsequently, the most suitable feature descriptors and classifiers for both layers were identified. The output of these single-feature models, embedded with multiview information, was trained with an appropriate classifier to provide the final prediction. Comprehensive cross-validation and independent tests substantiate that ADP-Fuse surpasses single-feature models and the feature fusion approach for the prediction of ADPs and their types. In addition, the SHapley Additive exPlanation method was used to elucidate the contributions of individual features to the prediction of ADPs and their types. Finally, a user-friendly web server for ADP-Fuse was developed and made publicly accessible (https://balalab-skku.org/ADP-Fuse), enabling the swift screening and identification of novel ADPs and their types. This framework is expected to contribute significantly to antidiabetic peptide identification.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Minkyung Song
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea; Department of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea; Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| |
Collapse
|
27
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
28
|
Cui Z, Wu Y, Zhang QH, Wang SG, He Y, Huang DS. MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer. Front Microbiol 2023; 14:1238199. [PMID: 37675425 PMCID: PMC10477591 DOI: 10.3389/fmicb.2023.1238199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 08/02/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction Imbalances in gut microbes have been implied in many human diseases, including colorectal cancer (CRC), inflammatory bowel disease, type 2 diabetes, obesity, autism, and Alzheimer's disease. Compared with other human diseases, CRC is a gastrointestinal malignancy with high mortality and a high probability of metastasis. However, current studies mainly focus on the prediction of colorectal cancer while neglecting the more serious malignancy of metastatic colorectal cancer (mCRC). In addition, high dimensionality and small samples lead to the complexity of gut microbial data, which increases the difficulty of traditional machine learning models. Methods To address these challenges, we collected and processed 16S rRNA data and calculated abundance data from patients with non-metastatic colorectal cancer (non-mCRC) and mCRC. Different from the traditional health-disease classification strategy, we adopted a novel disease-disease classification strategy and proposed a microbiome-based multi-view convolutional variational information bottleneck (MV-CVIB). Results The experimental results show that MV-CVIB can effectively predict mCRC. This model can achieve AUC values above 0.9 compared to other state-of-the-art models. Not only that, MV-CVIB also achieved satisfactory predictive performance on multiple published CRC gut microbiome datasets. Discussion Finally, multiple gut microbiota analyses were used to elucidate communities and differences between mCRC and non-mCRC, and the metastatic properties of CRC were assessed by patient age and microbiota expression.
Collapse
Affiliation(s)
- Zhen Cui
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Yan Wu
- College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Qin-Hu Zhang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - Si-Guo Wang
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Ying He
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | | |
Collapse
|
29
|
Yang M, Huang ZA, Zhou W, Ji J, Zhang J, He S, Zhu Z. MIX-TPI: a flexible prediction framework for TCR-pMHC interactions based on multimodal representations. Bioinformatics 2023; 39:btad475. [PMID: 37527015 PMCID: PMC10423027 DOI: 10.1093/bioinformatics/btad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/05/2023] [Accepted: 07/29/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION The interactions between T-cell receptors (TCR) and peptide-major histocompatibility complex (pMHC) are essential for the adaptive immune system. However, identifying these interactions can be challenging due to the limited availability of experimental data, sequence data heterogeneity, and high experimental validation costs. RESULTS To address this issue, we develop a novel computational framework, named MIX-TPI, to predict TCR-pMHC interactions using amino acid sequences and physicochemical properties. Based on convolutional neural networks, MIX-TPI incorporates sequence-based and physicochemical-based extractors to refine the representations of TCR-pMHC interactions. Each modality is projected into modality-invariant and modality-specific representations to capture the uniformity and diversities between different features. A self-attention fusion layer is then adopted to form the classification module. Experimental results demonstrate the effectiveness of MIX-TPI in comparison with other state-of-the-art methods. MIX-TPI also shows good generalization capability on mutual exclusive evaluation datasets and a paired TCR dataset. AVAILABILITY AND IMPLEMENTATION The source code of MIX-TPI and the test data are available at: https://github.com/Wolverinerine/MIX-TPI.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zhi-An Huang
- Research Office, City University of Hong Kong (Dongguan), Dongguan 523000, China
| | - Wei Zhou
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Junkai Ji
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Jun Zhang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
30
|
Rcheulishvili N, Mao J, Papukashvili D, Feng S, Liu C, Wang X, He Y, Wang PG. Design, evaluation, and immune simulation of potentially universal multi-epitope mpox vaccine candidate: focus on DNA vaccine. Front Microbiol 2023; 14:1203355. [PMID: 37547674 PMCID: PMC10403236 DOI: 10.3389/fmicb.2023.1203355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/03/2023] [Indexed: 08/08/2023] Open
Abstract
Monkeypox (mpox) is a zoonotic infectious disease caused by the mpox virus. Mpox symptoms are similar to smallpox with less severity and lower mortality. As yet mpox virus is not characterized by as high transmissibility as some severe acute respiratory syndrome 2 (SARS-CoV-2) variants, still, it is spreading, especially among men who have sex with men (MSM). Thus, taking preventive measures, such as vaccination, is highly recommended. While the smallpox vaccine has demonstrated considerable efficacy against the mpox virus due to the antigenic similarities, the development of a universal anti-mpox vaccine remains a necessary pursuit. Recently, nucleic acid vaccines have garnered special attention owing to their numerous advantages compared to traditional vaccines. Importantly, DNA vaccines have certain advantages over mRNA vaccines. In this study, a potentially universal DNA vaccine candidate against mpox based on conserved epitopes was designed and its efficacy was evaluated via an immunoinformatics approach. The vaccine candidate demonstrated potent humoral and cellular immune responses in silico, indicating the potential efficacy in vivo and the need for further research.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Yunjiao He
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Peng George Wang
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
31
|
Bao W, Gu Y, Chen B, Yu H. Golgi_DF: Golgi proteins classification with deep forest. Front Neurosci 2023; 17:1197824. [PMID: 37250391 PMCID: PMC10213405 DOI: 10.3389/fnins.2023.1197824] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/31/2023] Open
Abstract
Introduction Golgi is one of the components of the inner membrane system in eukaryotic cells. Its main function is to send the proteins involved in the synthesis of endoplasmic reticulum to specific parts of cells or secrete them outside cells. It can be seen that Golgi is an important organelle for eukaryotic cells to synthesize proteins. Golgi disorders can cause various neurodegenerative and genetic diseases, and the accurate classification of Golgi proteins is helpful to develop corresponding therapeutic drugs. Methods This paper proposed a novel Golgi proteins classification method, which is Golgi_DF with the deep forest algorithm. Firstly, the classified proteins method can be converted the vector features containing various information. Secondly, the synthetic minority oversampling technique (SMOTE) is utilized to deal with the classified samples. Next, the Light GBM method is utilized to feature reduction. Meanwhile, the features can be utilized in the penultimate dense layer. Therefore, the reconstructed features can be classified with the deep forest algorithm. Results In Golgi_DF, this method can be utilized to select the important features and identify Golgi proteins. Experiments show that the well-performance than the other art-of-the state methods. Golgi_DF as a standalone tools, all its source codes publicly available at https://github.com/baowz12345/golgiDF. Discussion Golgi_DF employed reconstructed feature to classify the Golgi proteins. Such method may achieve more available features among the UniRep features.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Yujian Gu
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Baitong Chen
- Department of Stomatology, Xuzhou First People’s Hospital, Xuzhou, China
- The Affiliated Hospital of China University of Mining and Technology, Xuzhou, China
| | - Huiping Yu
- Department of Neurosurgery, The Hospital of Joint Logistic, Quanzhou, China
| |
Collapse
|
32
|
Xi J, Sun D, Chang C, Zhou S, Huang Q. An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers. Comput Biol Med 2023; 155:106672. [PMID: 36805226 DOI: 10.1016/j.compbiomed.2023.106672] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/06/2023] [Accepted: 02/10/2023] [Indexed: 02/16/2023]
Abstract
The radiogenomics analysis can provide the connections between genomics and radiomics, which can infer the genomic features of tumors from their radiogenomic associations through the low-cost and non-invasiveness screening ultrasonic images. Although there are a number of pioneer approaches exploring the connections between genomic aberrations and ultrasonic features, these studies mainly focus on the relationship between ultrasonic features and only the most popular cancer genes, confronting two difficulties: missing many-to-many relationships as omics-to-omics view, and confounding group-specific associations with whole sample associations. To overcome the difficulty of omics-to-omics view and the issue of tumor heterogeneity, we propose an omics-to-omics joint knowledge association subtensor model. Specifically, the subtensor factorization framework can successfully discover the joint cross-modal module via an omics-to-omics view, while the sparse weight sample indication strategy can mine sample subgroups from the multi-omic data with tumor heterogeneity. The experimental evaluation result shows the jointness of the discovered modules across omics, their association with tumorigenesis contribution, and their relation for cancer related functions. In summary, our proposed omics-to-omics joint knowledge association subtensor model can serve as an efficient tool for radiogenomic knowledge associations, promoting the cross-modal knowledge graph construction of in explainable artificial intelligence cancer diagnosis.
Collapse
Affiliation(s)
- Jianing Xi
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Donghui Sun
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Cai Chang
- Department of Ultrasound, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Shichong Zhou
- Department of Ultrasound, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Qinghua Huang
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, China.
| |
Collapse
|