1
|
Abulimiti B, An H, Yaermaimaiti G, Kadir A, Wei J, Xiang M, Long J, Zhang S, Zhang B. Observation of reversible conformational interconversion accompanied by 3p internal conversions in Rydberg-excited N,N-dimethylethylamine. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 326:125279. [PMID: 39423557 DOI: 10.1016/j.saa.2024.125279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/06/2024] [Accepted: 10/09/2024] [Indexed: 10/21/2024]
Abstract
Conformational dynamics has been well observed in the 3s Rydberg state of amines, whereas its observation in higher-energy, non-equilibrium 3p Rydberg states is very rare, especially for a reversible conformational transition that could compete with other non-adiabatic transitions. Herein, we report the observation of a reversible conformational interconversion phenomenon in the 3p Rydberg excited-state dynamics of N,N-dimethylethylamine (DMEA). Upon electronic excitation, a forward and backward interconversion between the initially prepared 3p_l and 3p_h conformers accompanied by 3p internal conversions occurs, resulting in a 3p_l/3p_h equilibrium ratio of 61 %/39 % within ∼1.5 ps. The ensuing parallel internal conversions from the 3p_l to 3s_l and 3p_h to 3s_h deposit about 1.80 eV of vibrational energy into the 3s state, enabling a fast conformational interconversion between the 3s_h and 3s_l conformers to proceed within ∼2.0 ps. The final 3s_l/3s_h equilibrium ratio was determined to be 76 %/24 %. This work presents a real-time observation of the entire conformational interconversion process initiating from the higher-energy 3p states and finally reaching an equilibrium on the lower-energy 3s state.
Collapse
Affiliation(s)
- Bumaliya Abulimiti
- Xinjiang Key Laboratory for Luminescence Minerals and Optical Functional Materials, School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China; State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; School of Chemistry and Chemical Engineering, Xinjiang Normal University, Urumqi 830054, China
| | - Huan An
- Xinjiang Key Laboratory for Luminescence Minerals and Optical Functional Materials, School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China; School of Chemistry and Chemical Engineering, Xinjiang Normal University, Urumqi 830054, China
| | - Gulimire Yaermaimaiti
- Xinjiang Key Laboratory for Luminescence Minerals and Optical Functional Materials, School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China
| | - Abduhalik Kadir
- Xinjiang Key Laboratory for Luminescence Minerals and Optical Functional Materials, School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China
| | - Jie Wei
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China.
| | - Mei Xiang
- Xinjiang Key Laboratory for Luminescence Minerals and Optical Functional Materials, School of Physics and Electronic Engineering, Xinjiang Normal University, Urumqi 830054, China.
| | - Jinyou Long
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China.
| | - Song Zhang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China.
| | - Bing Zhang
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| |
Collapse
|
2
|
Totaro MG, Vide U, Zausinger R, Winkler A, Oberdorfer G. ESM-scan-A tool to guide amino acid substitutions. Protein Sci 2024; 33:e5221. [PMID: 39565080 PMCID: PMC11577456 DOI: 10.1002/pro.5221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 09/27/2024] [Accepted: 10/28/2024] [Indexed: 11/21/2024]
Abstract
Protein structure prediction and (re)design have gone through a revolution in the last 3 years. The tremendous progress in these fields has been almost exclusively driven by readily available machine learning algorithms applied to protein folding and sequence design problems. Despite these advancements, predicting site-specific mutational effects on protein stability and function remains an unsolved problem. This is a persistent challenge, mainly because the free energy of large systems is very difficult to compute with absolute accuracy and subtle changes to protein structures are hard to capture with computational models. Here, we describe the implementation and use of ESM-Scan, which uses the ESM zero-shot predictor to scan entire protein sequences for preferential amino acid changes, thus enabling in silico deep mutational scanning experiments. We benchmark ESM-Scan on its predictive capabilities for stability and functionality of sequence changes using three publicly available datasets and proceed by experimentally testing the tool's performance on a challenging test case of a blue-light-activated diguanylate cyclase from Methylotenera species (MsLadC), where it accurately predicted the importance of a highly conserved residue in a region involved in allosteric product inhibition. Our experimental results show that the ESM-zero shot model is capable of inferring the effects of a set of amino acid substitutions in their correlation between predicted fitness and experimental results. ESM-Scan is publicly available at https://huggingface.co/spaces/thaidaev/zsp.
Collapse
Affiliation(s)
| | - Uršula Vide
- Institute of BiochemistryGraz University of TechnologyGrazAustria
| | - Regina Zausinger
- Institute of BiochemistryGraz University of TechnologyGrazAustria
| | - Andreas Winkler
- Institute of BiochemistryGraz University of TechnologyGrazAustria
- BioTechMedGrazAustria
| | - Gustav Oberdorfer
- Institute of BiochemistryGraz University of TechnologyGrazAustria
- BioTechMedGrazAustria
| |
Collapse
|
3
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
4
|
Son A, Park J, Kim W, Lee W, Yoon Y, Ji J, Kim H. Integrating Computational Design and Experimental Approaches for Next-Generation Biologics. Biomolecules 2024; 14:1073. [PMID: 39334841 PMCID: PMC11430650 DOI: 10.3390/biom14091073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 08/13/2024] [Accepted: 08/26/2024] [Indexed: 09/30/2024] Open
Abstract
Therapeutic protein engineering has revolutionized medicine by enabling the development of highly specific and potent treatments for a wide range of diseases. This review examines recent advances in computational and experimental approaches for engineering improved protein therapeutics. Key areas of focus include antibody engineering, enzyme replacement therapies, and cytokine-based drugs. Computational methods like structure-based design, machine learning integration, and protein language models have dramatically enhanced our ability to predict protein properties and guide engineering efforts. Experimental techniques such as directed evolution and rational design approaches continue to evolve, with high-throughput methods accelerating the discovery process. Applications of these methods have led to breakthroughs in affinity maturation, bispecific antibodies, enzyme stability enhancement, and the development of conditionally active cytokines. Emerging approaches like intracellular protein delivery, stimulus-responsive proteins, and de novo designed therapeutic proteins offer exciting new possibilities. However, challenges remain in predicting in vivo behavior, scalable manufacturing, immunogenicity mitigation, and targeted delivery. Addressing these challenges will require continued integration of computational and experimental methods, as well as a deeper understanding of protein behavior in complex physiological environments. As the field advances, we can anticipate increasingly sophisticated and effective protein therapeutics for treating human diseases.
Collapse
Affiliation(s)
- Ahrum Son
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA;
| | - Jongham Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (W.L.); (Y.Y.)
| | - Woojin Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (W.L.); (Y.Y.)
| | - Wonseok Lee
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (W.L.); (Y.Y.)
| | - Yoonki Yoon
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (W.L.); (Y.Y.)
| | - Jaeho Ji
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea;
| | - Hyunsoo Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (W.L.); (Y.Y.)
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea;
- Protein AI Design Institute, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- SCICS (Sciences for Panomics), 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| |
Collapse
|
5
|
Ma XN, Li MY, Qi GQ, Wei LN, Zhang DK. SUMOylation at the crossroads of gut health: insights into physiology and pathology. Cell Commun Signal 2024; 22:404. [PMID: 39160548 PMCID: PMC11331756 DOI: 10.1186/s12964-024-01786-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 08/10/2024] [Indexed: 08/21/2024] Open
Abstract
SUMOylation, a post-translational modification involving the covalent attachment of small ubiquitin-like modifier (SUMO) proteins to target substrates, plays a pivotal role at the intersection of gut health and disease, influencing various aspects of intestinal physiology and pathology. This review provides a comprehensive examination of SUMOylation's diverse roles within the gut microenvironment. We examine its critical roles in maintaining epithelial barrier integrity, regulating immune responses, and mediating host-microbe interactions, thereby highlighting the complex molecular mechanisms that underpin gut homeostasis. Furthermore, we explore the impact of SUMOylation dysregulation in various intestinal disorders, including inflammatory bowel diseases and colorectal cancer, highlighting its implications as a potential diagnostic biomarker and therapeutic target. By integrating current research findings, this review offers valuable insights into the dynamic interplay between SUMOylation and gut health, paving the way for novel therapeutic strategies aimed at restoring intestinal equilibrium and combating associated pathologies.
Collapse
Affiliation(s)
- Xue-Ni Ma
- Key Laboratory of Digestive Diseases, Lanzhou University Second Hospital, Lanzhou, 730030, China
- The Second Clinical Medical College, Lanzhou University, Lanzhou, 730030, China
| | - Mu-Yang Li
- Key Laboratory of Digestive Diseases, Lanzhou University Second Hospital, Lanzhou, 730030, China
- The Second Clinical Medical College, Lanzhou University, Lanzhou, 730030, China
| | - Guo-Qing Qi
- Department of Gastroenterology, Lanzhou University Second Hospital, Lanzhou, 730030, China
| | - Li-Na Wei
- Department of Gastroenterology, Lanzhou University Second Hospital, Lanzhou, 730030, China
| | - De-Kui Zhang
- Key Laboratory of Digestive Diseases, Lanzhou University Second Hospital, Lanzhou, 730030, China.
- Department of Gastroenterology, Lanzhou University Second Hospital, Lanzhou, 730030, China.
| |
Collapse
|
6
|
Gu J, Zhao Y, Ben Y, Zhang S, Hua L, He S, Liu R, Chen X, Sheng H. A personalized mRNA signature for predicting hypertrophic cardiomyopathy applying machine learning methods. Sci Rep 2024; 14:17023. [PMID: 39043774 PMCID: PMC11266364 DOI: 10.1038/s41598-024-67201-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 07/09/2024] [Indexed: 07/25/2024] Open
Abstract
Hypertrophic cardiomyopathy (HCM) may lead to cardiac dysfunction and sudden death. This study was designed to develop a HCM signature applying bioinformatics and machine learning methods. Data of HCM and normal tissues were obtained from public databases to screen differentially expressed genes (DEGs) using the R software limma package. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were performed for enrichment analysis of HCM-associated DEGs. Hub genes for HCM were determined using weighted gene co-expression network analysis (WGCNA) together with two machine learning algorithms (SVM-RFE and LASSO). Finally, we introduced a zebrafish model to simulate changes in the hub genes in the HCM and to observe their effects on cardiac disease development. The mRNA expression data from a total of 106 HCM tissues and 39 normal samples were collected and we screened 157 DEGs. Enrichment analysis showed that immune pathways played an important role in the pathogenesis of HCM. Three hub genes (FCN3, MYH6 and RASD1) were identified using WGCNA, SVM-RFE, and LASSO analysis. In a zebrafish model, knockdown of MYH6 and RASD1 resulted in cardiac malformations with reduced ventricular capacity and heart rate, which validated the clinical significance of these genes in the diagnosis of HCM. Based on machine learning algorithms, our study created a signature with potential impact on cardiac function and cardiac quality index for HCM. The current findings had important implications for the early diagnosis and treatment of HCM.
Collapse
Affiliation(s)
- Jue Gu
- Affiliated Hospital of Nantong University, No.20 Xisi Road, Nantong, 226000, Jiangsu Province, China
| | - Yamin Zhao
- Nantong Second People's Hospital, Nantong, China
| | - Yue Ben
- Affiliated Hospital of Nantong University, No.20 Xisi Road, Nantong, 226000, Jiangsu Province, China
| | - Siming Zhang
- Medical School of Nantong University, Nantong University, Nantong, China
| | - Liqi Hua
- Medical School of Nantong University, Nantong University, Nantong, China
| | - Songnian He
- Medical School of Nantong University, Nantong University, Nantong, China
| | - Ruizi Liu
- Medical School of Nantong University, Nantong University, Nantong, China
| | - Xu Chen
- Medical School of Nantong University, Nantong University, Nantong, China.
| | - Hongzhuan Sheng
- Affiliated Hospital of Nantong University, No.20 Xisi Road, Nantong, 226000, Jiangsu Province, China.
| |
Collapse
|
7
|
Li X, Qian Y, Hu Y, Chen J, Yue H, Deng L. MSF-PFP: A Novel Multisource Feature Fusion Model for Protein Function Prediction. J Chem Inf Model 2024; 64:1502-1511. [PMID: 38413369 DOI: 10.1021/acs.jcim.3c01794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Protein function prediction is essential for disease treatment and drug development; yet, traditional biological experimental methods are less efficient in annotating protein function, and existing automated methods fail to fully leverage protein multisource data. Here, we present MSF-PFP, a computational framework that fuses multisource data features to predict protein function with high accuracy. Our framework designs specific models for feature extraction based on the characteristics of various data sources, including a global-local-individual strategy for local location features. MSF-PFP then integrates extracted features through a multisource feature fusion model, ultimately categorizing protein functions. Experimental results demonstrate that MSF-PFP outperforms eight state-of-the-art models, achieving FMax scores of 0.542, 0.675, and 0.624 for the biological process (BP), molecular function (MF), and cellular component (CC), respectively. The source code and data set for MSF-PFP are available at https://swanhub.co/TianGua/MSF-PFP, facilitating further exploration and validation of the proposed framework. This study highlights the potential of multisource data fusion in enhancing protein function prediction, contributing to improved disease therapy and medication discovery strategies.
Collapse
Affiliation(s)
- Xinhui Li
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yurong Qian
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yue Hu
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Jiaying Chen
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Haitao Yue
- School of Future Technology, Xinjiang University, Urumqi 830017, China
- Laboratory of Synthetic Biology, School of Life Science and Technology, Xinjiang University, Urumqi 830017, China
| | - Lei Deng
- School of Software, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
8
|
Korlepara DB, C S V, Srivastava R, Pal PK, Raza SH, Kumar V, Pandit S, Nair AG, Pandey S, Sharma S, Jeurkar S, Thakran K, Jaglan R, Verma S, Ramachandran I, Chatterjee P, Nayar D, Priyakumar UD. PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications. Sci Data 2024; 11:180. [PMID: 38336857 PMCID: PMC10858175 DOI: 10.1038/s41597-023-02872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/21/2023] [Indexed: 02/12/2024] Open
Abstract
Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski's rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery.
Collapse
Affiliation(s)
- Divya B Korlepara
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Divison of Physics, School of Advanced Sciences, Vellore Institute of Technology, Chennai, 600127, India
| | - Vasavi C S
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Department of Artificial Intelligence, School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Bengaluru, 560035, India
| | - Rakesh Srivastava
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Saalim H Raza
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivam Pandit
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Aathira G Nair
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Sanjana Pandey
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Kavita Thakran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Reena Jaglan
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivangi Verma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Indhu Ramachandran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Prathit Chatterjee
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India.
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
9
|
Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023; 3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]
Abstract
Proteins function as integral actors in essential life processes, rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation. Within the context of protein research, an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings. Due to the exorbitant costs and limited throughput inherent in experimental investigations, computational models offer a promising alternative to accelerate protein function annotation. In recent years, protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks. This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction. In this review, we elucidate the historical evolution and research paradigms of computational methods for predicting protein function. Subsequently, we summarize the progress in protein and molecule representation as well as feature extraction techniques. Furthermore, we assess the performance of machine learning-based algorithms across various objectives in protein function prediction, thereby offering a comprehensive perspective on the progress within this field.
Collapse
Affiliation(s)
- Jiaxiao Chen
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| |
Collapse
|
10
|
Varshney N, Mishra AK. Deep Learning in Phosphoproteomics: Methods and Application in Cancer Drug Discovery. Proteomes 2023; 11:proteomes11020016. [PMID: 37218921 DOI: 10.3390/proteomes11020016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/24/2023] Open
Abstract
Protein phosphorylation is a key post-translational modification (PTM) that is a central regulatory mechanism of many cellular signaling pathways. Several protein kinases and phosphatases precisely control this biochemical process. Defects in the functions of these proteins have been implicated in many diseases, including cancer. Mass spectrometry (MS)-based analysis of biological samples provides in-depth coverage of phosphoproteome. A large amount of MS data available in public repositories has unveiled big data in the field of phosphoproteomics. To address the challenges associated with handling large data and expanding confidence in phosphorylation site prediction, the development of many computational algorithms and machine learning-based approaches have gained momentum in recent years. Together, the emergence of experimental methods with high resolution and sensitivity and data mining algorithms has provided robust analytical platforms for quantitative proteomics. In this review, we compile a comprehensive collection of bioinformatic resources used for the prediction of phosphorylation sites, and their potential therapeutic applications in the context of cancer.
Collapse
Affiliation(s)
- Neha Varshney
- Division of Biological Sciences, Department of Cellular and Molecular Medicine, University of California, San Diego, CA 93093, USA
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA
| | - Abhinava K Mishra
- Molecular, Cellular and Developmental Biology Department, University of California, Santa Barbara, CA 93106, USA
| |
Collapse
|
11
|
Jia M, Li J, Zhang J, Wei N, Yin Y, Chen H, Yan S, Wang Y. Identification and validation of cuproptosis related genes and signature markers in bronchopulmonary dysplasia disease using bioinformatics analysis and machine learning. BMC Med Inform Decis Mak 2023; 23:69. [PMID: 37060021 PMCID: PMC10105406 DOI: 10.1186/s12911-023-02163-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/31/2023] [Indexed: 04/16/2023] Open
Abstract
BACKGROUND Bronchopulmonary Dysplasia (BPD) has a high incidence and affects the health of preterm infants. Cuproptosis is a novel form of cell death, but its mechanism of action in the disease is not yet clear. Machine learning, the latest tool for the analysis of biological samples, is still relatively rarely used for in-depth analysis and prediction of diseases. METHODS AND RESULTS First, the differential expression of cuproptosis-related genes (CRGs) in the GSE108754 dataset was extracted and the heat map showed that the expression of NFE2L2 gene was significantly higher in the control group whereas the expression of GLS gene was significantly higher in the treatment group. Chromosome location analysis showed that both the genes were positively correlated and associated with chromosome 2. The results of immune infiltration and immune cell differential analysis showed differences in the four immune cells, significantly in Monocytes cells. Five new pathways were analyzed through two subgroups based on consistent clustering of CRG expression. Weighted correlation network analysis (WGCNA) set the screening condition to the top 25% to obtain the disease signature genes. Four machine learning algorithms: Generalized Linear Models (GLM), Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB) were used to screen the disease signature genes, and the final five marker genes for disease prediction. The models constructed by GLM method were proved to be more accurate in the validation of two datasets, GSE190215 and GSE188944. CONCLUSION We eventually identified two copper death-associated genes, NFE2L2 and GLS. A machine learning model-GLM was constructed to predict the prevalence of BPD disease, and five disease signature genes NFATC3, ERMN, PLA2G4A, MTMR9LP and LOC440700 were identified. These genes that were bioinformatics analyzed could be potential targets for identifying BPD disease and treatment.
Collapse
Affiliation(s)
| | - Jieyi Li
- Shanghai Literature Institute of Traditional Chinese Medicine, Shanghai, 200000, China
| | - Jingying Zhang
- Shanghai Literature Institute of Traditional Chinese Medicine, Shanghai, 200000, China
| | - Ningjing Wei
- ChengZheng Wisdom (Shanghai) Health Sciences and Technology Co., Ltd, Shanghai, 200000, China
| | - Yating Yin
- ChengZheng Wisdom (Shanghai) Health Sciences and Technology Co., Ltd, Shanghai, 200000, China
| | - Hui Chen
- Shanghai Literature Institute of Traditional Chinese Medicine, Shanghai, 200000, China
| | - Shixing Yan
- Shanghai Daosh Medical Technology Co., Ltd, Shanghai, 200000, China
| | - Yong Wang
- Shanghai Literature Institute of Traditional Chinese Medicine, Shanghai, 200000, China.
| |
Collapse
|