1
|
Campos TL, Korhonen PK, Young ND, Chang BC, Gasser RB. Inference of essential genes in Brugia malayi and Onchocerca volvulus by machine learning and the implications for discovering new interventions. Comput Struct Biotechnol J 2024; 23:3081-3089. [PMID: 39185442 PMCID: PMC11342751 DOI: 10.1016/j.csbj.2024.07.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/30/2024] [Accepted: 07/31/2024] [Indexed: 08/27/2024] Open
Abstract
Detailed explorations of the model organisms Caenorhabditis elegans (elegant worm) and Drosophila melanogaster (vinegar fly) have substantially improved our knowledge and understanding of biological processes and pathways in metazoan organisms. Extensive functional genomic and multi-omic data sets have enabled the discovery and characterisation of 'essential' genes that are critical for the survival of these organisms. Recently, we showed that a machine learning (ML)-based pipeline could be utilised to predict essential genes in both C. elegans and D. melanogaster using features from DNA, RNA, protein and/or cellular data or associated information. As these distantly-related species are within the Ecdysozoa, we hypothesised that this approach could be suited for non-model organisms within the same group (phylum) of protostome animals. In the present investigation, we cross-predicted essential genes within the phylum Nematoda - between C. elegans and the parasitic filarial nematodes Brugia malayi and Onchocerca volvulus, and then ranked and prioritised these genes. Highly ranked genes were linked to key biological pathways or processes, such as ribosome biogenesis, translation and RNA processing, and were expressed at relatively high levels in the germline, gonad, hypodermis and/or nerves. The present in silico workflow is hoped to expedite the identification of drug targets in parasitic organisms for subsequent experimental validation in the laboratory.
Collapse
Affiliation(s)
- Túlio L. Campos
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
- Núcleo de Bioinformática, Instituto Aggeu Magalhães, Fiocruz., Av. Professor Moraes Rego, s/n, Cidade Universitária, Recife, PE CEP 50740–465, Brazil
| | - Pasi K. Korhonen
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D. Young
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Bill C.H. Chang
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B. Gasser
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
2
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
3
|
Xiao C, Zhou Z, She J, Yin J, Cui F, Zhang Z. PEL-PVP: Application of plant vacuolar protein discriminator based on PEFT ESM-2 and bilayer LSTM in an unbalanced dataset. Int J Biol Macromol 2024; 277:134317. [PMID: 39094861 DOI: 10.1016/j.ijbiomac.2024.134317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/10/2024] [Accepted: 07/28/2024] [Indexed: 08/04/2024]
Abstract
Plant vacuoles, play a crucial role in maintaining cellular stability, adapting to environmental changes, and responding to external pressures. The accurate identification of vacuolar proteins (PVPs) is crucial for understanding the biosynthetic mechanisms of intracellular vacuoles and the adaptive mechanisms of plants. In order to more accurately identify vacuole proteins, this study developed a new predictive model PEL-PVP based on ESM-2. Through this study, the feasibility and effectiveness of using advanced pre-training models and fine-tuning techniques for bioinformatics tasks were demonstrated, providing new methods and ideas for plant vacuolar protein research. In addition, previous datasets for vacuolar proteins were balanced, but imbalance is more closely related to the actual situation. Therefore, this study constructed an imbalanced dataset UB-PVP from the UniProt database,helping the model better adapt to the complexity and uncertainty in real environments, thereby improving the model's generalization ability and practicality. The experimental results show that compared with existing recognition techniques, achieving significant improvements in multiple indicators, with 6.08 %, 13.51 %, 11.9 %, and 5 % improvements in ACC, SP, MCC, and AUC, respectively. The accuracy reaches 94.59 %, significantly higher than the previous best model GraphIdn. This provides an efficient and precise tool for the study of plant vacuole proteins.
Collapse
Affiliation(s)
- Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zheyu Zhou
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi She
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jinfen Yin
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
4
|
Deutschmeyer VE, Schlaudraff NA, Walesch SK, Moyer J, Sokol AM, Graumann J, Meissner W, Schneider M, Muley T, Helmbold P, Schwinn M, Richter AM, Schmitz ML, Dammann RH. SIAH3 is frequently epigenetically silenced in cancer and regulates mitochondrial metabolism. Int J Cancer 2024. [PMID: 39344659 DOI: 10.1002/ijc.35202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 07/31/2024] [Accepted: 09/04/2024] [Indexed: 10/01/2024]
Abstract
Of the seven in absentia homologue (SIAH) family, three members have been identified in the human genome. In contrast to the E3 ubiquitin ligase encoding SIAH1 and SIAH2, little is known on the regulation and function of SIAH3 in tumorigenesis. In this study, we reveal that SIAH3 is frequently epigenetically silenced in different cancer entities, including cutaneous melanoma, lung adenocarcinoma and head and neck cancer. Low SIAH3 levels correlate with an impaired survival of cancer patients. Additionally, induced expression of SIAH3 reduces cell proliferation and induces cell death. Functionally, SIAH3 negatively affects cellular metabolism by shifting cells form aerobic oxidative phosphorylation to glycolysis. SIAH3 is localized in the mitochondrion and interacts with proteins involved in mitochondrial ribosome biogenesis and translation. We also report that SIAH3 interacts with ubiquitin ligases, including SIAH1 or SIAH2, and is degraded by them. These results suggest that SIAH3 acts as an epigenetically controlled tumor suppressor by regulating cellular metabolism through the inhibition of oxidative phosphorylation.
Collapse
Affiliation(s)
| | - Nico A Schlaudraff
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
| | - Sara K Walesch
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
| | - Janine Moyer
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
| | - Anna M Sokol
- Biomolecular Mass Spectrometry, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Johannes Graumann
- Biomolecular Mass Spectrometry, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
- Institute of Translational Proteomics, Department of Medicine, Philipps-University, Marburg, Germany
| | - Wolfgang Meissner
- Core Facility for Cellular Metabolism, Department of Medicine, Philipps-University, Marburg, Germany
| | - Marc Schneider
- Translational Research Unit, Thoraxklinik at Heidelberg University Hospital, Heidelberg, Germany
- University of Giessen Marburg Lung Center (UGMLC) and Translational Lung Research Center (TLRC) Heidelberg, German Center for Lung Research (DZL), Giessen, Germany
| | - Thomas Muley
- Translational Research Unit, Thoraxklinik at Heidelberg University Hospital, Heidelberg, Germany
- University of Giessen Marburg Lung Center (UGMLC) and Translational Lung Research Center (TLRC) Heidelberg, German Center for Lung Research (DZL), Giessen, Germany
| | - Peter Helmbold
- Department of Dermatology, University of Heidelberg, Heidelberg, Germany
| | - Markus Schwinn
- Institute of Biochemistry, Medical Faculty of the University Giessen, Giessen, Germany
| | - Antje M Richter
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
| | - M Lienhard Schmitz
- Institute of Biochemistry, Medical Faculty of the University Giessen, Giessen, Germany
| | - Reinhard H Dammann
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
- University of Giessen Marburg Lung Center (UGMLC) and Translational Lung Research Center (TLRC) Heidelberg, German Center for Lung Research (DZL), Giessen, Germany
| |
Collapse
|
5
|
Han J, Kong T, Liu J. PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model. Commun Biol 2024; 7:1198. [PMID: 39341947 PMCID: PMC11438969 DOI: 10.1038/s42003-024-06911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/17/2024] [Indexed: 10/01/2024] Open
Abstract
Identifying anti-inflammatory peptides (AIPs) and antimicrobial peptides (AMPs) is crucial for the discovery of innovative and effective peptide-based therapies targeting inflammation and microbial infections. However, accurate identification of AIPs and AMPs remains a computational challenge mainly due to limited utilization of peptide sequence information. Here, we propose PepNet, an interpretable neural network for predicting both AIPs and AMPs by applying a pre-trained protein language model to fully utilize the peptide sequence information. It first captures the information of residue arrangements and physicochemical properties using a residual dilated convolution block, and then seizes the function-related diverse information by introducing a residual Transformer block to characterize the residue representations generated by a pre-trained protein language model. After training and testing, PepNet demonstrates great superiority over other leading AIP and AMP predictors and shows strong interpretability of its learned peptide representations. A user-friendly web server for PepNet is freely available at http://liulab.top/PepNet/server .
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Tongxin Kong
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China.
| |
Collapse
|
6
|
Franco Cairo JPL, Almeida DV, Andrade VB, Terrasan CRF, Telfer A, Gonçalves TA, Diaz DE, Figueiredo FL, Brenelli LB, Walton PH, Damasio A, Garcia W, Squina FM. Biochemical and structural insights of a recombinant AA16 LPMO from the marine and sponge-symbiont Peniophora sp. Int J Biol Macromol 2024; 280:135596. [PMID: 39276894 DOI: 10.1016/j.ijbiomac.2024.135596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 09/10/2024] [Accepted: 09/11/2024] [Indexed: 09/17/2024]
Abstract
Lytic polysaccharide monooxygenases (LPMOs) are copper-dependent enzymes that oxidize polysaccharides, leading to their cleavage. LPMOs are classified into eight CAZy families (AA9-11, AA13-17), with the functionality of AA16 being poorly characterized. This study presents biochemical and structural data for an AA16 LPMO (PnAA16) from the marine sponge symbiont Peniophora sp. Phylogenetic analysis revealed that PnAA16 clusters separately from previously characterized AA16s. However, the structural modelling of PnAA16 showed the characteristic immunoglobulin-like fold of LPMOs, with a conserved his-brace motif coordinating a copper ion. The copper-bound PnAA16 showed greater thermal stability than its apo-form, highlighting copper's role in enzyme stability. Functionally, PnAA16 demonstrated oxidase activity, producing 5 μM H₂O₂ after 30 min, but showed 20 times lower peroxidase activity (0.27 U/g) compared to a fungal AA9. Specific activity assays indicated that PnAA16 acts only on cellohexaose, generating native celloligosaccharides (C3 to C5) and oxidized products with regioselective oxidation at C1 and C4 positions. Finally, PnAA16 boosted the activity of a cellulolytic cocktail for cellulose saccharification in the presence of ascorbic acid, hydrogen peroxide, or both. In conclusion, the present work provides insights into the AA16 family, expanding the understanding of their structural and functional relationships and biotechnological potential.
Collapse
Affiliation(s)
- João Paulo L Franco Cairo
- Laboratório de Ciências Moleculares (LACIMO), Universidade de Sorocaba (UNISO), Sorocaba, Brazil; Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Department of Biochemistry and Tissue Biology, Institute of Biology, Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil; Department of Chemistry, University of York, York, United Kingdom
| | - Dnane V Almeida
- Centro de Ciências Naturais e Humanas (CCNH), Universidade Federal do ABC (UFABC), Santo André, SP, Brazil
| | - Viviane B Andrade
- Centro de Ciências Naturais e Humanas (CCNH), Universidade Federal do ABC (UFABC), Santo André, SP, Brazil
| | - César R F Terrasan
- Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Department of Biochemistry and Tissue Biology, Institute of Biology, Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Abbey Telfer
- Department of Chemistry, University of York, York, United Kingdom
| | - Thiago A Gonçalves
- Laboratório de Ciências Moleculares (LACIMO), Universidade de Sorocaba (UNISO), Sorocaba, Brazil
| | - Daniel E Diaz
- Department of Chemistry, University of York, York, United Kingdom
| | - Fernanda L Figueiredo
- Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Department of Biochemistry and Tissue Biology, Institute of Biology, Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Livia B Brenelli
- Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Department of Biochemistry and Tissue Biology, Institute of Biology, Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Paul H Walton
- Department of Chemistry, University of York, York, United Kingdom
| | - André Damasio
- Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Department of Biochemistry and Tissue Biology, Institute of Biology, Universidade Estadual de Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Wanius Garcia
- Centro de Ciências Naturais e Humanas (CCNH), Universidade Federal do ABC (UFABC), Santo André, SP, Brazil
| | - Fabio M Squina
- Laboratório de Ciências Moleculares (LACIMO), Universidade de Sorocaba (UNISO), Sorocaba, Brazil.
| |
Collapse
|
7
|
Li Z, Zhang B, Chan JJ, Tabatabaeian H, Tong QY, Chew XH, Fan X, Driguez P, Chan C, Cheong F, Wang S, Siew BE, Tan IJW, Lee KY, Lieske B, Cheong WK, Kappei D, Tan KK, Gao X, Tay Y. An isoform-resolution transcriptomic atlas of colorectal cancer from long-read single-cell sequencing. CELL GENOMICS 2024; 4:100641. [PMID: 39216476 DOI: 10.1016/j.xgen.2024.100641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 06/06/2024] [Accepted: 08/07/2024] [Indexed: 09/04/2024]
Abstract
Colorectal cancer (CRC) ranks as the second leading cause of cancer deaths globally. In recent years, short-read single-cell RNA sequencing (scRNA-seq) has been instrumental in deciphering tumor heterogeneities. However, these studies only enable gene-level quantification but neglect alterations in transcript structures arising from alternative end processing or splicing. In this study, we integrated short- and long-read scRNA-seq of CRC samples to build an isoform-resolution CRC transcriptomic atlas. We identified 394 dysregulated transcript structures in tumor epithelial cells, including 299 resulting from various combinations of splicing events. Second, we characterized genes and isoforms associated with epithelial lineages and subpopulations exhibiting distinct prognoses. Among 31,935 isoforms with novel junctions, 330 were supported by The Cancer Genome Atlas RNA-seq and mass spectrometry data. Finally, we built an algorithm that integrated novel peptides derived from open reading frames of recurrent tumor-specific transcripts with mass spectrometry data and identified recurring neoepitopes that may aid the development of cancer vaccines.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Bin Zhang
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
| | - Jia Jia Chan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Hossein Tabatabaeian
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Qing Yun Tong
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Xiao Hong Chew
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Xiaonan Fan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Patrick Driguez
- Core Labs, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Charlene Chan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Faith Cheong
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Shi Wang
- Department of Pathology, National University Health System, Singapore 119228, Singapore
| | - Bei En Siew
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
| | - Ian Jse-Wei Tan
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Division of Colorectal Surgery, University Surgical Cluster, National University Health System, Singapore 119228, Singapore
| | - Kai-Yin Lee
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Division of Colorectal Surgery, University Surgical Cluster, National University Health System, Singapore 119228, Singapore
| | - Bettina Lieske
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Division of Colorectal Surgery, University Surgical Cluster, National University Health System, Singapore 119228, Singapore
| | - Wai-Kit Cheong
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Division of Colorectal Surgery, University Surgical Cluster, National University Health System, Singapore 119228, Singapore
| | - Dennis Kappei
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; NUS Centre for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
| | - Ker-Kan Tan
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Division of Colorectal Surgery, University Surgical Cluster, National University Health System, Singapore 119228, Singapore
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia; Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
| | - Yvonne Tay
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; NUS Centre for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore; Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore.
| |
Collapse
|
8
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIOPHYSICAL REPORTS 2024; 4:100167. [PMID: 38909903 PMCID: PMC11305224 DOI: 10.1016/j.bpr.2024.100167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 06/20/2024] [Indexed: 06/25/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. In addition, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from noncoding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, Colorado
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Physics, University of Colorado Boulder, Boulder, Colorado.
| |
Collapse
|
9
|
Hu G, Moon J, Hayashi T. Protein Classes Predicted by Molecular Surface Chemical Features: Machine Learning-Assisted Classification of Cytosol and Secreted Proteins. J Phys Chem B 2024; 128:8423-8436. [PMID: 39185763 PMCID: PMC11382266 DOI: 10.1021/acs.jpcb.4c02461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Chemical structures of protein surfaces govern intermolecular interaction, and protein functions include specific molecular recognition, transport, self-assembly, etc. Therefore, the relationship between the chemical structure and protein functions provides insights into the understanding of the mechanism underlying protein functions and developments of new biomaterials. In this study, we analyze protein surface features, including surface amino acid populations and secondary structure ratios, instead of entire sequences as input for the classifier, intending to provide deeper insights into the determination of protein classes (cytosol or secreted). We employed a random forest-based classifier for the prediction of protein locations. Our training and testing data sets consisting of secreted and cytosol proteins were constructed using filtered information from UniProt and 3D structures from AlphaFold. The classifier achieved a testing accuracy of 93.9% with a feature importance ranking and quantitative boundary values for the top three features. We discuss the significance of these features quantitatively and the hidden rules to determine the protein classes (cytosol or secreted).
Collapse
Affiliation(s)
- Guanghao Hu
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
| | - Jooa Moon
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
| | - Tomohiro Hayashi
- Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan
- The Institute for Solid State Physics, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-0882, Japan
| |
Collapse
|
10
|
Erckert K, Rost B. Assessing the role of evolutionary information for enhancing protein language model embeddings. Sci Rep 2024; 14:20692. [PMID: 39237735 PMCID: PMC11377704 DOI: 10.1038/s41598-024-71783-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
Collapse
Affiliation(s)
- Kyra Erckert
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Burkhard Rost
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
11
|
Chile N, Bernal-Teran EG, Condori BJ, Clark T, Garcia HH, Gilman RH, Verastegui MR. Characterization of antigenic proteins of the Taenia solium postoncospheral form. Mol Biochem Parasitol 2024; 259:111621. [PMID: 38705360 PMCID: PMC11197303 DOI: 10.1016/j.molbiopara.2024.111621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 04/17/2024] [Accepted: 04/22/2024] [Indexed: 05/07/2024]
Abstract
Neurocysticercosis is the leading cause for acquired epilepsy worldwide, and it is caused by the larval stage of the parasite Taenia solium. Several proteins of this stage have been characterized and studied to understand the parasite-host interaction, however, the proteins from the early cysticercus stages (the postoncospheral form) have not yet been characterized. The study of the postoncospheral form proteins is important to understand the host-parasite relationship in the early stages of infection. The aim of this work was to identify postoncospheral form antigenic proteins using sera from neurocysticercosis patients. T. solium activated oncospheres were cultured in HCT-8 cells to obtain the postoncospheral form. Soluble total and excretory/secretory proteins were obtained from the postoncospheral form and were incubated with both pool sera and individual serum of neurocysticercosis positive human patients. Immunoblotting showed target antigenic proteins with apparent molecular weights of 23 kDa and 46-48 kDa. The 46-48 kDa antigen bands present in soluble total and excretory/secretory postoncospheral form proteins were analyzed by LC-MS/MS; proteins identified were: nuclear elongation factor 1 alpha, enolase, unnamed protein product/antigen diagnostic GP50, calcium binding protein calreticulin precursor and annexin. The postoncospheral form expresses proteins related to interaction with the host, some of these proteins are predicted to be exosomal proteins. In conclusion, postoncospheral proteins are consistent targets of the humoral immune response in human and may serve as targets for diagnosis and vaccines.
Collapse
Affiliation(s)
- Nancy Chile
- Laboratorio de Investigación de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias e Ingeniería, Universidad Peruana Cayetano Heredia, Lima, Perú.
| | - Edson G Bernal-Teran
- Laboratorio de Investigación de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias e Ingeniería, Universidad Peruana Cayetano Heredia, Lima, Perú
| | - Beth J Condori
- Laboratorio de Investigación de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias e Ingeniería, Universidad Peruana Cayetano Heredia, Lima, Perú
| | - Taryn Clark
- Department of Emergency Medicine, SUNY Downstate Medical Center/Kings County Hospital Medical Center, Brooklyn, NY, USA; Department of International Health, Bloomberg School of Hygiene and Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Hector H Garcia
- Instituto Nacional de Ciencias Neurológicas. Unidad de Cisticercosis. Lima, Perú
| | - Robert H Gilman
- Department of International Health, Bloomberg School of Hygiene and Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Manuela R Verastegui
- Laboratorio de Investigación de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias e Ingeniería, Universidad Peruana Cayetano Heredia, Lima, Perú
| |
Collapse
|
12
|
Ferrer Florensa A, Almagro Armenteros J, Nielsen H, Aarestrup F, Clausen P. SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects. NAR Genom Bioinform 2024; 6:lqae106. [PMID: 39157582 PMCID: PMC11327874 DOI: 10.1093/nargab/lqae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/26/2024] [Accepted: 08/05/2024] [Indexed: 08/20/2024] Open
Abstract
The use of deep learning models in computational biology has increased massively in recent years, and it is expected to continue with the current advances in the fields such as Natural Language Processing. These models, although able to draw complex relations between input and target, are also inclined to learn noisy deviations from the pool of data used during their development. In order to assess their performance on unseen data (their capacity to generalize), it is common to split the available data randomly into development (train/validation) and test sets. This procedure, although standard, has been shown to produce dubious assessments of generalization due to the existing similarity between samples in the databases used. In this work, we present SpanSeq, a database partition method for machine learning that can scale to most biological sequences (genes, proteins and genomes) in order to avoid data leakage between sets. We also explore the effect of not restraining similarity between sets by reproducing the development of two state-of-the-art models on bioinformatics, not only confirming the consequences of randomly splitting databases on the model assessment, but expanding those repercussions to the model development. SpanSeq is available at https://github.com/genomicepidemiology/SpanSeq.
Collapse
Affiliation(s)
- Alfred Ferrer Florensa
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Jose Juan Almagro Armenteros
- Informatics and Predictive Sciences Research, Bristol Myers Squibb Company, Calle Isaac Newton 4, 41092 Sevilla, Spain
| | - Henrik Nielsen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Frank Møller Aarestrup
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Philip Thomas Lanken Conradsen Clausen
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
13
|
Thagun C, Odahara M, Kodama Y, Numata K. Identification of a highly efficient chloroplast-targeting peptide for plastid engineering. PLoS Biol 2024; 22:e3002785. [PMID: 39298532 PMCID: PMC11444414 DOI: 10.1371/journal.pbio.3002785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 10/01/2024] [Accepted: 08/03/2024] [Indexed: 09/22/2024] Open
Abstract
Plastids are pivotal target organelles for comprehensively enhancing photosynthetic and metabolic traits in plants via plastid engineering. Plastidial proteins predominantly originate in the nucleus and must traverse membrane-bound multiprotein translocons to access these organelles. This import process is meticulously regulated by chloroplast-targeting peptides (cTPs). Whereas many cTPs have been employed to guide recombinantly expressed functional proteins to chloroplasts, there is a critical need for more efficient cTPs. Here, we performed a comprehensive exploration and comparative assessment of an advanced suite of cTPs exhibiting superior targeting capabilities. We employed a multifaceted approach encompassing computational prediction, in planta expression, fluorescence tracking, and in vitro chloroplast import studies to identify and analyze 88 cTPs associated with Arabidopsis thaliana mutants with phenotypes linked to chloroplast function. These polypeptides exhibited distinct abilities to transport green fluorescent protein (GFP) to various compartments within leaf cells, particularly chloroplasts. A highly efficient cTP derived from Arabidopsis plastid ribosomal protein L35 (At2g24090) displayed remarkable effectiveness in chloroplast localization. This cTP facilitated the activities of chloroplast-targeted RNA-processing proteins and metabolic enzymes within plastids. This cTP could serve as an ideal transit peptide for precisely targeting biomolecules to plastids, leading to advancements in plastid engineering.
Collapse
Affiliation(s)
- Chonprakun Thagun
- Department of Material Chemistry, Graduate School of Engineering, Kyoto University, Kyoto-Daigaku-Katsura, Kyoto, Japan
- Center for Bioscience Research and Education, Utsunomiya University, Tochigi, Japan
| | - Masaki Odahara
- Biomacromolecules Research Team, RIKEN Center for Sustainable Resource Science, Saitama, Japan
| | - Yutaka Kodama
- Center for Bioscience Research and Education, Utsunomiya University, Tochigi, Japan
- Biomacromolecules Research Team, RIKEN Center for Sustainable Resource Science, Saitama, Japan
| | - Keiji Numata
- Department of Material Chemistry, Graduate School of Engineering, Kyoto University, Kyoto-Daigaku-Katsura, Kyoto, Japan
- Biomacromolecules Research Team, RIKEN Center for Sustainable Resource Science, Saitama, Japan
| |
Collapse
|
14
|
Schmirler R, Heinzinger M, Rost B. Fine-tuning protein language models boosts predictions across diverse tasks. Nat Commun 2024; 15:7407. [PMID: 39198457 PMCID: PMC11358375 DOI: 10.1038/s41467-024-51844-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 08/15/2024] [Indexed: 09/01/2024] Open
Abstract
Prediction methods inputting embeddings from protein language models have reached or even surpassed state-of-the-art performance on many protein prediction tasks. In natural language processing fine-tuning large language models has become the de facto standard. In contrast, most protein language model-based protein predictions do not back-propagate to the language model. Here, we compare the fine-tuning of three state-of-the-art models (ESM2, ProtT5, Ankh) on eight different tasks. Two results stand out. Firstly, task-specific supervised fine-tuning almost always improves downstream predictions. Secondly, parameter-efficient fine-tuning can reach similar improvements consuming substantially fewer resources at up to 4.5-fold acceleration of training over fine-tuning full models. Our results suggest to always try fine-tuning, in particular for problems with small datasets, such as for fitness landscape predictions of a single protein. For ease of adaptability, we provide easy-to-use notebooks to fine-tune all models used during this work for per-protein (pooling) and per-residue prediction tasks.
Collapse
Affiliation(s)
- Robert Schmirler
- TUM (Technical University of Munich), School of Computation, Information and Technology (CIT), Faculty of Informatics, Chair of Bioinformatics & Computational Biology - i12, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany.
- AbbVie Deutschland GmbH & Co. KG, Innovation Center, BTS IR LU, Ludwigshafen, Germany.
| | - Michael Heinzinger
- TUM (Technical University of Munich), School of Computation, Information and Technology (CIT), Faculty of Informatics, Chair of Bioinformatics & Computational Biology - i12, Garching/Munich, Germany
| | - Burkhard Rost
- TUM (Technical University of Munich), School of Computation, Information and Technology (CIT), Faculty of Informatics, Chair of Bioinformatics & Computational Biology - i12, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Freising, Germany
| |
Collapse
|
15
|
Tan Y, Li M, Zhou B, Zhong B, Zheng L, Tan P, Zhou Z, Yu H, Fan G, Hong L. Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models. J Chem Inf Model 2024; 64:6338-6349. [PMID: 39110130 DOI: 10.1021/acs.jcim.4c00689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Fine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is nontrivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark data sets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated convergence speed by a maximum of 1034% and an average of 362%, the training efficiency is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter.
Collapse
Affiliation(s)
- Yang Tan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
| | - Mingchen Li
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
| | - Bingxin Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bozitao Zhong
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Cell and Developmental Biology & Michigan Neuroscience Institute, University of Michigan Medical School, Ann Arbor, Michigan 48104, United States
| | - Pan Tan
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ziyi Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Huiqun Yu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liang Hong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
16
|
Tan Y, Li M, Zhou Z, Tan P, Yu H, Fan G, Hong L. PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications. J Cheminform 2024; 16:92. [PMID: 39095917 PMCID: PMC11297785 DOI: 10.1186/s13321-024-00884-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 07/13/2024] [Indexed: 08/04/2024] Open
Abstract
Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino acids as isolated tokens. Instead, the PLMs should recognize the frequently occurring combinations of amino acids as a single token. In this study, we use the byte-pair-encoding algorithm and unigram to construct advanced residue vocabularies for protein sequence tokenization, and we have shown that PLMs pre-trained using these advanced vocabularies exhibit superior performance on downstream tasks when compared to those trained with simple vocabularies. Furthermore, we introduce PETA, a comprehensive benchmark for systematically evaluating PLMs. We find that vocabularies comprising 50 and 200 elements achieve optimal performance. Our code, model weights, and datasets are available at https://github.com/ginnm/ProteinPretraining . SCIENTIFIC CONTRIBUTION: This study introduces advanced protein sequence tokenization analysis, leveraging the byte-pair-encoding algorithm and unigram. By recognizing frequently occurring combinations of amino acids as single tokens, our proposed method enhances the performance of PLMs on downstream tasks. Additionally, we present PETA, a new comprehensive benchmark for the systematic evaluation of PLMs, demonstrating that vocabularies of 50 and 200 elements offer optimal performance.
Collapse
Affiliation(s)
- Yang Tan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China
| | - Mingchen Li
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China
| | - Ziyi Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Pan Tan
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
| | - Huiqun Yu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Liang Hong
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China.
| |
Collapse
|
17
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024; 23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
18
|
Lim SJ, Noor NDM, Sabri S, Ali MSM, Salleh AB, Oslan SN. Extracellular BSA-degrading SAPs in the rare pathogen Meyerozyma guilliermondii strain SO as potential virulence factors in candidiasis. Microb Pathog 2024; 193:106773. [PMID: 38960213 DOI: 10.1016/j.micpath.2024.106773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 05/08/2024] [Accepted: 06/30/2024] [Indexed: 07/05/2024]
Abstract
Meyerozyma guilliermondii (Candida guilliermondii) is one of the Candida species associated with invasive candidiasis. With the potential for expressing industrially important enzymes, M. guilliermondii strain SO possessed 99 % proteome similarity with the clinical ATCC 6260 isolate and showed pathogenicity towards zebrafish embryos. Recently, three secreted aspartyl proteinases (SAPs) were computationally identified as potential virulence factors in this strain without in vitro verification of SAP activity. The quantification of Candida SAPs activity in liquid broth were also scarcely reported. Thus, this study aimed to characterize M. guilliermondii strain SO's ability to produce SAPs (MgSAPs) in different conditions (morphology and medium) besides analyzing its growth profile. MgSAPs' capability to cleave bovine serum albumin (BSA) was also determined to propose that MgSAPs as the potential virulence factors compared to the avirulent Saccharomyces cerevisiae. M. guilliermondii strain SO produced more SAPs (higher activity) in yeast nitrogen base-BSA-dextrose broth compared to yeast extract-BSA-dextrose broth despite insignificantly different SAP activity in both planktonic and biofilm cells. FeCl3 supplementation significantly increased the specific protein activity (∼40 %). The BSA cleavage by MgSAPs at an acidic pH was proven through semi-quantitative SDS-PAGE, sharing similar profile with HIV-1 retropepsin. The presented work highlighted the MgSAPs on fungal cell wall and extracellular milieu during host infection could be corroborated to the quantitative production in different growth modes presented herein besides shedding lights on the potential usage of retropepsin's inhibitors in treating candidiasis. Molecular and expression analyses of MgSAPs and their deletion should be further explored to attribute their respective virulence effects.
Collapse
Affiliation(s)
- Si Jie Lim
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| | - Noor Dina Muhd Noor
- Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| | - Suriana Sabri
- Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| | - Mohd Shukuri Mohamad Ali
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| | - Abu Bakar Salleh
- Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| | - Siti Nurbaya Oslan
- Enzyme Technology and X-ray Crystallography Laboratory, VacBio 5, Institute of Bioscience, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Enzyme and Microbial Technology (EMTech) Research Centre, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia; Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400, UPM Serdang, Selangor, Malaysia.
| |
Collapse
|
19
|
Acharya S, Troell HA, Billingsley RL, Lawrence KS, McKirgan DS, Alkharouf NW, Klink VP. Glycine max polygalacturonase inhibiting protein 11 (GmPGIP11) functions in the root to suppress Heterodera glycines parasitism. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2024; 213:108755. [PMID: 38875777 DOI: 10.1016/j.plaphy.2024.108755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 06/16/2024]
Abstract
Pathogen-secreted polygalacturonases (PGs) alter plant cell wall structure by cleaving the α-(1 → 4) linkages between D-galacturonic acid residues in homogalacturonan (HG), macerating the cell wall, facilitating infection. Plant PG inhibiting proteins (PGIPs) disengage pathogen PGs, impairing infection. The soybean cyst nematode, Heterodera glycines, obligate root parasite produces secretions, generating a multinucleate nurse cell called a syncytium, a byproduct of the merged cytoplasm of 200-250 root cells, occurring through cell wall maceration. The common cytoplasmic pool, surrounded by an intact plasma membrane, provides a source from which H. glycines derives nourishment but without killing the parasitized cell during a susceptible reaction. The syncytium is also the site of a naturally-occurring defense response that happens in specific G. max genotypes. Transcriptomic analyses of RNA isolated from the syncytium undergoing the process of defense have identified that one of the 11 G. max PGIPs, GmPGIP11, is expressed during defense. Functional transgenic analyses show roots undergoing GmPGIP11 overexpression (OE) experience an increase in its relative transcript abundance (RTA) as compared to the ribosomal protein 21 (GmRPS21) control, leading to a decrease in H. glycines parasitism as compared to the overexpression control. The GmPGIP11 undergoing RNAi experiences a decrease in its RTA as compared to the GmRPS21 control with transgenic roots experiencing an increase in H. glycines parasitism as compared to the RNAi control. Pathogen associated molecular pattern (PAMP) triggered immunity (PTI) and effector triggered immunity (ETI) components are shown to influence GmPGIP11 expression while numerous agricultural crops are shown to have homologs.
Collapse
Affiliation(s)
- Sudha Acharya
- Department of Computer and Information Sciences, Towson University, Towson, MD, 21252, USA; USDA-ARS-NEA-BARC Molecular Plant Pathology Laboratory, Building 004, Room 122, BARC-West, 10300 Baltimore Ave., Beltsville, MD, 20705, USA
| | - Hallie A Troell
- Department of Biological Sciences, Mississippi State University, MS, 39762, USA
| | - Rebecca L Billingsley
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, MS, 39762, USA
| | - Kathy S Lawrence
- Department of Entomology and Plant Pathology, Auburn University, 209 Life Science Building, Auburn, AL, 36849, USA
| | - Daniel S McKirgan
- Department of Computer and Information Sciences, Towson University, Towson, MD, 21252, USA
| | - Nadim W Alkharouf
- Department of Computer and Information Sciences, Towson University, Towson, MD, 21252, USA
| | - Vincent P Klink
- USDA-ARS-NEA-BARC Molecular Plant Pathology Laboratory, Building 004, Room 122, BARC-West, 10300 Baltimore Ave., Beltsville, MD, 20705, USA.
| |
Collapse
|
20
|
Azevedo LG, Sosa E, de Queiroz ATL, Barral A, Wheeler RJ, Nicolás MF, Farias LP, Do Porto DF, Ramos PIP. High-throughput prioritization of target proteins for development of new antileishmanial compounds. Int J Parasitol Drugs Drug Resist 2024; 25:100538. [PMID: 38669848 PMCID: PMC11068527 DOI: 10.1016/j.ijpddr.2024.100538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 03/11/2024] [Accepted: 04/04/2024] [Indexed: 04/28/2024]
Abstract
Leishmaniasis, a vector-borne disease, is caused by the infection of Leishmania spp., obligate intracellular protozoan parasites. Presently, human vaccines are unavailable, and the primary treatment relies heavily on systemic drugs, often presenting with suboptimal formulations and substantial toxicity, making new drugs a high priority for LMIC countries burdened by the disease, but a low priority in the agenda of most pharmaceutical companies due to unattractive profit margins. New ways to accelerate the discovery of new, or the repositioning of existing drugs, are needed. To address this challenge, our study aimed to identify potential protein targets shared among clinically-relevant Leishmania species. We employed a subtractive proteomics and comparative genomics approach, integrating high-throughput multi-omics data to classify these targets based on different druggability metrics. This effort resulted in the ranking of 6502 ortholog groups of protein targets across 14 pathogenic Leishmania species. Among the top 20 highly ranked groups, metabolic processes known to be attractive drug targets, including the ubiquitination pathway, aminoacyl-tRNA synthetases, and purine synthesis, were rediscovered. Additionally, we unveiled novel promising targets such as the nicotinate phosphoribosyltransferase enzyme and dihydrolipoamide succinyltransferases. These groups exhibited appealing druggability features, including less than 40% sequence identity to the human host proteome, predicted essentiality, structural classification as highly druggable or druggable, and expression levels above the 50th percentile in the amastigote form. The resources presented in this work also represent a comprehensive collection of integrated data regarding trypanosomatid biology.
Collapse
Affiliation(s)
- Lucas G Azevedo
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz Bahia), Salvador, Bahia, Brazil; Post-graduate Program in Biotechnology and Investigative Medicine, Instituto Gonçalo Moniz, Salvador, Bahia, Brazil.
| | - Ezequiel Sosa
- Universidad de Buenos Aires, Buenos Aires, Argentina.
| | - Artur T L de Queiroz
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz Bahia), Salvador, Bahia, Brazil; Post-graduate Program in Biotechnology and Investigative Medicine, Instituto Gonçalo Moniz, Salvador, Bahia, Brazil.
| | - Aldina Barral
- Laboratório de Medicina e Saúde Pública de Precisão (MeSP2), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz Bahia), Salvador, Bahia, Brazil.
| | - Richard J Wheeler
- Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
| | - Marisa F Nicolás
- Laboratório Nacional de Computação Científica, Petrópolis, Rio de Janeiro, Brazil.
| | - Leonardo P Farias
- Post-graduate Program in Biotechnology and Investigative Medicine, Instituto Gonçalo Moniz, Salvador, Bahia, Brazil; Laboratório de Medicina e Saúde Pública de Precisão (MeSP2), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz Bahia), Salvador, Bahia, Brazil.
| | | | - Pablo Ivan P Ramos
- Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz Bahia), Salvador, Bahia, Brazil; Post-graduate Program in Biotechnology and Investigative Medicine, Instituto Gonçalo Moniz, Salvador, Bahia, Brazil.
| |
Collapse
|
21
|
Zhang X, Tseo Y, Bai Y, Chen F, Uhler C. Prediction of protein subcellular localization in single cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.25.605178. [PMID: 39091825 PMCID: PMC11291118 DOI: 10.1101/2024.07.25.605178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
The subcellular localization of a protein is important for its function and interaction with other molecules, and its mislocalization is linked to numerous diseases. While atlas-scale efforts have been made to profile protein localization across various cell lines, existing datasets only contain limited pairs of proteins and cell lines which do not cover all human proteins. We present a method that uses both protein sequences and cellular landmark images to perform P redictions of U nseen P roteins' S ubcellular localization ( PUPS ), which can generalize to both proteins and cell lines not used for model training. PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images for protein localization prediction. The protein sequence input enables generalization to unseen proteins and the cellular image input enables cell type specific prediction that captures single-cell variability. PUPS' ability to generalize to unseen proteins and cell lines enables us to assess the variability in protein localization across cell lines as well as across single cells within a cell line and to identify the biological processes associated with the proteins that have variable localization. Experimental validation shows that PUPS can be used to predict protein localization in newly performed experiments outside of the Human Protein Atlas used for training. Collectively, PUPS utilizes both protein sequences and cellular images to predict protein localization in unseen proteins and cell lines with the ability to capture single-cell variability.
Collapse
|
22
|
Singh K, Sharma P, Jaiswal S, Mishra P, Maurya R, Muthusamy SK, Saharan MS, Jasrotia RS, Kumar J, Mishra S, Sheoran S, Singh GP, Angadi UB, Rai A, Tiwari R, Iquebal MA, Kumar D. Genome and transcriptome based comparative analysis of Tilletia indica to decipher the causal genes for pathogenicity of Karnal bunt in wheat. BMC PLANT BIOLOGY 2024; 24:676. [PMID: 39009989 PMCID: PMC11251232 DOI: 10.1186/s12870-024-04959-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/28/2024] [Indexed: 07/17/2024]
Abstract
Tilletia indica Mitra causes Karnal bunt (KB) in wheat by pathogenic dikaryophase. The present study is the first to provide the draft genomes of the dikaryon (PSWKBGD-3) and its two monosporidial lines (PSWKBGH-1 and 2) using Illumina and PacBio reads, their annotation and the comparative analyses among the three genomes by extracting polymorphic SSR markers. The trancriptome from infected wheat grains of the susceptible wheat cultivar WL711 at 24 h, 48h, and 7d after inoculation of PSWKBGH-1, 2 and PSWKBGD-3 were also isolated. Further, two transcriptome analyses were performed utilizing T. indica transcriptome to extract dikaryon genes responsible for pathogenesis, and wheat transcriptome to extract wheat genes affected by dikaryon involved in plant-pathogen interaction during progression of KB in wheat. A total of 54, 529, and 87 genes at 24hai, 48hai, and 7dai, respectively were upregulated in dikaryon stage while 21, 35, and 134 genes of T. indica at 24hai, 48hai, and 7dai, respectively, were activated only in dikaryon stage. While, a total of 23, 17, and 52 wheat genes at 24hai, 48hai, and 7dai, respectively were upregulated due to the presence of dikaryon stage only. The results obtained during this study have been compiled in a web resource called TiGeR ( http://backlin.cabgrid.res.in/tiger/ ), which is the first genomic resource for T. indica cataloguing genes, genomic and polymorphic SSRs of the three T. indica lines, wheat and T. indica DEGs as well as wheat genes affected by T. indica dikaryon along with the pathogenecity related proteins of T. indica dikaryon during incidence of KB at different time points. The present study would be helpful to understand the role of dikaryon in plant-pathogen interaction during progression of KB, which would be helpful to manage KB in wheat, and to develop KB-resistant wheat varieties.
Collapse
Affiliation(s)
- Kalpana Singh
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
- Department of Bioinformatics, College of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University, Ludhiana, India
| | - Pradeep Sharma
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
| | - Sarika Jaiswal
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Pallavi Mishra
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ranjeet Maurya
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Senthilkumar K Muthusamy
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
- ICAR-Central Tuber Crops Research Institute, Thiruvananthapuram, Kerala, India
| | - M S Saharan
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Rahul Singh Jasrotia
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Jitender Kumar
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
| | - Shefali Mishra
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
| | - Sonia Sheoran
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
| | - G P Singh
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India
| | - U B Angadi
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anil Rai
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ratan Tiwari
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, India.
| | - Mir Asif Iquebal
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.
| | - Dinesh Kumar
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
23
|
Adedeji EO, Beder T, Damiani C, Cappelli A, Accoti A, Tapanelli S, Ogunlana OO, Fatumo S, Favia G, Koenig R, Adebiyi E. Combination of computational techniques and RNAi reveal targets in Anopheles gambiae for malaria vector control. PLoS One 2024; 19:e0305207. [PMID: 38968330 PMCID: PMC11226046 DOI: 10.1371/journal.pone.0305207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 05/25/2024] [Indexed: 07/07/2024] Open
Abstract
Increasing reports of insecticide resistance continue to hamper the gains of vector control strategies in curbing malaria transmission. This makes identifying new insecticide targets or alternative vector control strategies necessary. CLassifier of Essentiality AcRoss EukaRyote (CLEARER), a leave-one-organism-out cross-validation machine learning classifier for essential genes, was used to predict essential genes in Anopheles gambiae and selected predicted genes experimentally validated. The CLEARER algorithm was trained on six model organisms: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Schizosaccharomyces pombe, and employed to identify essential genes in An. gambiae. Of the 10,426 genes in An. gambiae, 1,946 genes (18.7%) were predicted to be Cellular Essential Genes (CEGs), 1716 (16.5%) to be Organism Essential Genes (OEGs), and 852 genes (8.2%) to be essential as both OEGs and CEGs. RNA interference (RNAi) was used to validate the top three highly expressed non-ribosomal predictions as probable vector control targets, by determining the effect of these genes on the survival of An. gambiae G3 mosquitoes. In addition, the effect of knockdown of arginase (AGAP008783) on Plasmodium berghei infection in mosquitoes was evaluated, an enzyme we computationally inferred earlier to be essential based on chokepoint analysis. Arginase and the top three genes, AGAP007406 (Elongation factor 1-alpha, Elf1), AGAP002076 (Heat shock 70kDa protein 1/8, HSP), AGAP009441 (Elongation factor 2, Elf2), had knockdown efficiencies of 91%, 75%, 63%, and 61%, respectively. While knockdown of HSP or Elf2 significantly reduced longevity of the mosquitoes (p<0.0001) compared to control groups, Elf1 or arginase knockdown had no effect on survival. However, arginase knockdown significantly reduced P. berghei oocytes counts in the midgut of mosquitoes when compared to LacZ-injected controls. The study reveals HSP and Elf2 as important contributors to mosquito survival and arginase as important for parasite development, hence placing them as possible targets for vector control.
Collapse
Affiliation(s)
- Eunice O. Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
- School of Biosciences & Veterinary Medicine, University of Camerino, Camerino, Italy
- Department of Biology, University of York, York, United Kingdom
| | - Thomas Beder
- Medical Department II, Hematology and Oncology, University Medical Center Schleswig-Holstein, Kiel, Germany
- University Cancer Center Schleswig-Holstein, University Medical Center Schleswig-Holstein, Kiel and Lübeck, Germany
- Institute for Infectious Diseases and Infection Control (IIMK, RG Systemsbiology), Jena University Hospital, Jena, Germany
| | - Claudia Damiani
- School of Biosciences & Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Alessia Cappelli
- School of Biosciences & Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Anastasia Accoti
- School of Biosciences & Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Sofia Tapanelli
- Department of Life Sciences, Imperial College, London, United Kingdom
| | - Olubanke O. Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
- African Center of Excellence in Bioinformatics & Data Intensive Science, Makerere University, Kampala, Uganda
| | - Segun Fatumo
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Guido Favia
- School of Biosciences & Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Rainer Koenig
- Institute for Infectious Diseases and Infection Control (IIMK, RG Systemsbiology), Jena University Hospital, Jena, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- African Center of Excellence in Bioinformatics & Data Intensive Science, Makerere University, Kampala, Uganda
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
24
|
Yuan Q, Tian C, Song Y, Ou P, Zhu M, Zhao H, Yang Y. GPSFun: geometry-aware protein sequence function predictions with language models. Nucleic Acids Res 2024; 52:W248-W255. [PMID: 38738636 PMCID: PMC11223820 DOI: 10.1093/nar/gkae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/22/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Peihua Ou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Mingming Zhu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| |
Collapse
|
25
|
Ødum MT, Teufel F, Thumuluri V, Almagro Armenteros JJ, Johansen AR, Winther O, Nielsen H. DeepLoc 2.1: multi-label membrane protein type prediction using protein language models. Nucleic Acids Res 2024; 52:W215-W220. [PMID: 38587188 PMCID: PMC11223819 DOI: 10.1093/nar/gkae237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/06/2024] [Accepted: 03/21/2024] [Indexed: 04/09/2024] Open
Abstract
DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.
Collapse
Affiliation(s)
- Marius Thrane Ødum
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Felix Teufel
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Digital Science & Innovation, Novo Nordisk A/S, 2760 Måløv, Denmark
| | | | - José Juan Almagro Armenteros
- Bristol Myers Squibb Company, Informatics and Predictive Sciences Research, Calle Isaac Newton 4, Sevilla 41092, Spain
| | | | - Ole Winther
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Department of Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), 2100 Copenhagen, Denmark
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Henrik Nielsen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
26
|
Scott CJR, McGregor NGS, Leadbeater DR, Oates NC, Hoßbach J, Abood A, Setchfield A, Dowle A, Overkleeft HS, Davies GJ, Bruce NC. Parascedosporium putredinis NO1 tailors its secretome for different lignocellulosic substrates. Microbiol Spectr 2024; 12:e0394323. [PMID: 38757984 PMCID: PMC11218486 DOI: 10.1128/spectrum.03943-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/19/2024] [Indexed: 05/18/2024] Open
Abstract
Parascedosporium putredinis NO1 is a plant biomass-degrading ascomycete with a propensity to target the most recalcitrant components of lignocellulose. Here we applied proteomics and activity-based protein profiling (ABPP) to investigate the ability of P. putredinis NO1 to tailor its secretome for growth on different lignocellulosic substrates. Proteomic analysis of soluble and insoluble culture fractions following the growth of P. putredinis NO1 on six lignocellulosic substrates highlights the adaptability of the response of the P. putredinis NO1 secretome to different substrates. Differences in protein abundance profiles were maintained and observed across substrates after bioinformatic filtering of the data to remove intracellular protein contamination to identify the components of the secretome more accurately. These differences across substrates extended to carbohydrate-active enzymes (CAZymes) at both class and family levels. Investigation of abundant activities in the secretomes for each substrate revealed similar variation but also a high abundance of "unknown" proteins in all conditions investigated. Fluorescence-based and chemical proteomic ABPP of secreted cellulases, xylanases, and β-glucosidases applied to secretomes from multiple growth substrates for the first time confirmed highly adaptive time- and substrate-dependent glycoside hydrolase production by this fungus. P. putredinis NO1 is a promising new candidate for the identification of enzymes suited to the degradation of recalcitrant lignocellulosic feedstocks. The investigation of proteomes from the biomass bound and culture supernatant fractions provides a more complete picture of a fungal lignocellulose-degrading response. An in-depth understanding of this varied response will enhance efforts toward the development of tailored enzyme systems for use in biorefining.IMPORTANCEThe ability of the lignocellulose-degrading fungus Parascedosporium putredinis NO1 to tailor its secreted enzymes to different sources of plant biomass was revealed here. Through a combination of proteomic, bioinformatic, and fluorescent labeling techniques, remarkable variation was demonstrated in the secreted enzyme response for this ascomycete when grown on multiple lignocellulosic substrates. The maintenance of this variation over time when exploring hydrolytic polysaccharide-active enzymes through fluorescent labeling, suggests that this variation results from an actively tailored secretome response based on substrate. Understanding the tailored secretomes of wood-degrading fungi, especially from underexplored and poorly represented families, will be important for the development of effective substrate-tailored treatments for the conversion and valorization of lignocellulose.
Collapse
Affiliation(s)
- Conor J R Scott
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Nicholas G S McGregor
- York Structural Biology Laboratory, Department of Chemistry, The University of York, York, United Kingdom
| | - Daniel R Leadbeater
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Nicola C Oates
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Janina Hoßbach
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Amira Abood
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Alexander Setchfield
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| | - Adam Dowle
- Bioscience Technology Facility, Department of Biology, University of York, York, United Kingdom
| | | | - Gideon J Davies
- York Structural Biology Laboratory, Department of Chemistry, The University of York, York, United Kingdom
| | - Neil C Bruce
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom
| |
Collapse
|
27
|
Zhao H, Xiong Y, Zhou Z, Xu Q, Zi Y, Zheng X, Chen S, Xiao X, Gong L, Xu H, Liu L, Lu H, Cui Y, Shao S, Zhang J, Ma J, Zhou Q, Ma D, Li X. A hidden proteome encoded by circRNAs in human placentas: Implications for uncovering preeclampsia pathogenesis. Clin Transl Med 2024; 14:e1759. [PMID: 38997803 PMCID: PMC11245404 DOI: 10.1002/ctm2.1759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 06/21/2024] [Accepted: 06/25/2024] [Indexed: 07/14/2024] Open
Abstract
BACKGROUND CircRNA-encoded proteins (CEPs) are emerging as new players in health and disease, and function as baits for the common partners of their cognate linear-spliced RNA encoded proteins (LEPs). However, their prevalence across human tissues and biological roles remain largely unexplored. The placenta is an ideal model for identifying CEPs due to its considerable protein diversity that is required to sustain fetal development during pregnancy. The aim of this study was to evaluate circRNA translation in the human placenta, and the potential roles of the CEPs in placental development and dysfunction. METHODS Multiomics approaches, including RNA sequencing, ribosome profiling, and LC-MS/MS analysis, were utilised to identify novel translational events of circRNAs in human placentas. Bioinformatics methods and the protein bait hypothesis were employed to evaluate the roles of these newly discovered CEPs in placentation and associated disorders. The pathogenic role of a recently identified CEP circPRKCB119aa in preeclampsia was investigated through qRT-PCR, Western blotting, immunofluorescence imaging and phenotypic analyses. RESULTS We found that 528 placental circRNAs bound to ribosomes with active translational elongation, and 139 were translated to proteins. The CEPs showed considerable structural homology with their cognate LEPs, but are more stable, hydrophobic and have a lower molecular-weight than the latter, all of which are conducive to their function as baits. On this basis, CEPs are deduced to be closely involved in placental function. Furthermore, we focused on a novel CEP circPRKCB119aa, and illuminated its pathogenic role in preeclampsia; it enhanced trophoblast autophagy by acting as a bait to inhibit phosphorylation of the cognate linear isoform PKCβ. CONCLUSIONS We discovered a hidden circRNA-encoded proteome in the human placenta, which offers new insights into the mechanisms underlying placental development, as well as placental disorders such as preeclampsia. Key points A hidden circRNA-encoded proteome in the human placenta was extensively identified and systematically characterised. The circRNA-encoded proteins (CEPs) are potentially related to placental development and associated disorders. A novel conserved CEP circPRKCB119aa enhanced trophoblast autophagy by inhibiting phosphorylation of its cognate linear-spliced isoform protein kinase C (PKC) β in preeclampsia.
Collapse
Affiliation(s)
- Huanqiang Zhao
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| | - Yu Xiong
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Zixiang Zhou
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Qixin Xu
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| | - Yang Zi
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| | - Xiujie Zheng
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| | - Shiguo Chen
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| | - Xirong Xiao
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Lili Gong
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Huangfang Xu
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Lidong Liu
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Huiqing Lu
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Yutong Cui
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Shuyi Shao
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Jin Zhang
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jing Ma
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qiongjie Zhou
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Duan Ma
- Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xiaotian Li
- The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
- Institute of Maternal and Child Medicine, Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, Guangdong Province, China
| |
Collapse
|
28
|
Jiménez J, Mishra R, Wang X, Magee CM, Bonning BC. Composition and abundance of midgut plasma membrane proteins in two major hemipteran vectors of plant viruses, Bemisia tabaci and Myzus persicae. ARCHIVES OF INSECT BIOCHEMISTRY AND PHYSIOLOGY 2024; 116:e22133. [PMID: 39054788 DOI: 10.1002/arch.22133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/13/2024] [Accepted: 06/29/2024] [Indexed: 07/27/2024]
Abstract
Multiple species within the order Hemiptera cause severe agricultural losses on a global scale. Aphids and whiteflies are of particular importance due to their role as vectors for hundreds of plant viruses, many of which enter the insect via the gut. To facilitate the identification of novel targets for disruption of plant virus transmission, we compared the relative abundance and composition of the gut plasma membrane proteomes of adult Bemisia tabaci (Hemiptera: Aleyrodidae) and Myzus persicae (Hemiptera: Aphididae), representing the first study comparing the gut plasma membrane proteomes of two different insect species. Brush border membrane vesicles were prepared from dissected guts, and proteins extracted, identified and quantified from triplicate samples via timsTOF mass spectrometry. A total of 1699 B. tabaci and 1175 M. persicae proteins were identified. Following bioinformatics analysis and manual curation, 151 B. tabaci and 115 M. persicae proteins were predicted to localize to the plasma membrane of the gut microvilli. These proteins were further categorized based on molecular function and biological process according to Gene Ontology terms. The most abundant gut plasma membrane proteins were identified. The ten plasma membrane proteins that differed in abundance between the two insect species were associated with the terms "protein binding" and "viral processes." In addition to providing insight into the gut physiology of hemipteran insects, these gut plasma membrane proteomes provide context for appropriate identification of plant virus receptors based on a combination of bioinformatic prediction and protein localization on the surface of the insect gut.
Collapse
Affiliation(s)
- Jaime Jiménez
- Department of Entomology and Nematology, University of Florida, Gainesville, Florida, USA
| | - Ruchir Mishra
- Department of Entomology and Nematology, University of Florida, Gainesville, Florida, USA
| | - Xinyue Wang
- Department of Entomology and Nematology, University of Florida, Gainesville, Florida, USA
| | - Ciara M Magee
- Department of Entomology and Nematology, University of Florida, Gainesville, Florida, USA
| | - Bryony C Bonning
- Department of Entomology and Nematology, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
29
|
Dayrit GB, Burigsay NPF, Vera Cruz EM, Santos MD. In silico characterization and homology modeling of Nile tilapia ( Oreochromis niloticus) Hsp70cBi and Hsp70cBc proteins. Heliyon 2024; 10:e32748. [PMID: 39183877 PMCID: PMC11341309 DOI: 10.1016/j.heliyon.2024.e32748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/04/2024] [Accepted: 06/07/2024] [Indexed: 08/27/2024] Open
Abstract
The molecular chaperone heat shock proteins 70 (Hsp70) play a pivotal role in preserving cellular integrity and managing stress. This study extensively examined two Hsp70 proteins, On-Hsp70cBi, inducible, and On-Hsp70cBc, constitutively expressed, in Nile tilapia (Oreochromis niloticus) utilizing in silico analysis, homology modeling, and functional annotation. Employing the SWISS-MODEL program for homology modeling, the proposed models underwent thorough reliability assessment via ProSA, Verify 3D, PROVE, ERRAT, and Ramachandran plot analyses. Key features of On-Hsp70cBi and On-Hsp70cBc included amino acid lengths (640 and 645) and molecular weights (70,233.48 and 70,773.17 Da). Moreover, theoretical isoelectric points (pI = 5.63 and 5.28), indicated their acidic nature. Counts of negatively and positively charged residues (95 and 86; 95 and 81) revealed neutrality, while instability index (II) values of 35.27 (On-Hsp70cBi) and 38.85 (On-Hsp70cBc) suggested stability. Aliphatic index (AI) values were notably high for both proteins (84.58 and 82.85), indicating stability across a broad temperature range. Domain architecture analysis showed both proteins to contain an MreB/Mbl domain. Protein-protein interaction analysis identified the co-chaperone Stip1 as a primary functional partner. Comparative modeling yielded highly reliable 3D models, showcasing structural similarity to known proteins and predicted binding sites. Additionally, both proteins are primarily localized in the cytoplasm. Functional analysis predicted an AMP-PNP binding site for On-Hsp70cBi and an ATP binding site for On-Hsp70cBc. These findings deepened our understanding of Hsp70cBc and Hsp70cBi in Nile tilapia, underscoring their significance in fish physiology and warranting further investigation, thus advancing our knowledge of these proteins' roles in cellular processes and stress responses, potentially impacting fish health and resilience.
Collapse
Affiliation(s)
- Geraldine B. Dayrit
- The Graduate School, University of Santo Tomas, España Boulevard, Manila, 1015, Philippines
- College of Public Health, University of the Philippines Manila, Ermita, Manila, 1000, Philippines
- ONE ARM, Department of Medical Microbiology, College of Public Health, University of the Philippines Manila, Ermita, Manila, 1000, Philippines
| | - Normela Patricia F. Burigsay
- ONE ARM, Department of Medical Microbiology, College of Public Health, University of the Philippines Manila, Ermita, Manila, 1000, Philippines
| | - Emmanuel M. Vera Cruz
- Freshwater Aquaculture Center, Central Luzon State University, Science City of Muñoz, Nueva Ecija, Philippines
| | - Mudjekeewis D. Santos
- The Graduate School, University of Santo Tomas, España Boulevard, Manila, 1015, Philippines
- Freshwater Aquaculture Center, Central Luzon State University, Science City of Muñoz, Nueva Ecija, Philippines
- National Fisheries Research and Development Institute, Genetic Fingerprinting Laboratory, 101 Mother Ignacia Ave., South Triangle, Quezon City, 1101, Philippines
| |
Collapse
|
30
|
Campos TL, Korhonen PK, Young ND, Wang T, Song J, Marhoefer R, Chang BCH, Selzer PM, Gasser RB. Inference of Essential Genes of the Parasite Haemonchus contortus via Machine Learning. Int J Mol Sci 2024; 25:7015. [PMID: 39000124 PMCID: PMC11240989 DOI: 10.3390/ijms25137015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 06/19/2024] [Accepted: 06/21/2024] [Indexed: 07/16/2024] Open
Abstract
Over the years, comprehensive explorations of the model organisms Caenorhabditis elegans (elegant worm) and Drosophila melanogaster (vinegar fly) have contributed substantially to our understanding of complex biological processes and pathways in multicellular organisms generally. Extensive functional genomic-phenomic, genomic, transcriptomic, and proteomic data sets have enabled the discovery and characterisation of genes that are crucial for life, called 'essential genes'. Recently, we investigated the feasibility of inferring essential genes from such data sets using advanced bioinformatics and showed that a machine learning (ML)-based workflow could be used to extract or engineer features from DNA, RNA, protein, and/or cellular data/information to underpin the reliable prediction of essential genes both within and between C. elegans and D. melanogaster. As these are two distantly related species within the Ecdysozoa, we proposed that this ML approach would be particularly well suited for species that are within the same phylum or evolutionary clade. In the present study, we cross-predicted essential genes within the phylum Nematoda (evolutionary clade V)-between C. elegans and the pathogenic parasitic nematode H. contortus-and then ranked and prioritised H. contortus proteins encoded by these genes as intervention (e.g., drug) target candidates. Using strong, validated predictors, we inferred essential genes of H. contortus that are involved predominantly in crucial biological processes/pathways including ribosome biogenesis, translation, RNA binding/processing, and signalling and which are highly transcribed in the germline, somatic gonad precursors, sex myoblasts, vulva cell precursors, various nerve cells, glia, or hypodermis. The findings indicate that this in silico workflow provides a promising avenue to identify and prioritise panels/groups of drug target candidates in parasitic nematodes for experimental validation in vitro and/or in vivo.
Collapse
Affiliation(s)
- Túlio L Campos
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
- Bioinformatics Core Facility, Aggeu Magalhães Institute (Fiocruz), Recife 50740-465, PE, Brazil
| | - Pasi K Korhonen
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Neil D Young
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Tao Wang
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC 3800, Australia
- Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Clayton, VIC 3800, Australia
| | - Richard Marhoefer
- Boehringer Ingelheim Animal Health, Binger Strasse 173, 55216 Ingelheim am Rhein, Germany
| | - Bill C H Chang
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Paul M Selzer
- Boehringer Ingelheim Animal Health, Binger Strasse 173, 55216 Ingelheim am Rhein, Germany
| | - Robin B Gasser
- Department of Biosciences, Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
31
|
Veszelyi K, Czegle I, Varga V, Németh CE, Besztercei B, Margittai É. Subcellular Localization of Thioredoxin/Thioredoxin Reductase System-A Missing Link in Endoplasmic Reticulum Redox Balance. Int J Mol Sci 2024; 25:6647. [PMID: 38928353 PMCID: PMC11204020 DOI: 10.3390/ijms25126647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/12/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
The lumen of the endoplasmic reticulum (ER) is usually considered an oxidative environment; however, oxidized thiol-disulfides and reduced pyridine nucleotides occur there parallelly, indicating that the ER lumen lacks components which connect the two systems. Here, we investigated the luminal presence of the thioredoxin (Trx)/thioredoxin reductase (TrxR) proteins, capable of linking the protein thiol and pyridine nucleotide pools in different compartments. It was shown that specific activity of TrxR in the ER is undetectable, whereas higher activities were measured in the cytoplasm and mitochondria. None of the Trx/TrxR isoforms were expressed in the ER by Western blot analysis. Co-localization studies of various isoforms of Trx and TrxR with ER marker Grp94 by immunofluorescent analysis further confirmed their absence from the lumen. The probability of luminal localization of each isoform was also predicted to be very low by several in silico analysis tools. ER-targeted transient transfection of HeLa cells with Trx1 and TrxR1 significantly decreased cell viability and induced apoptotic cell death. In conclusion, the absence of this electron transfer chain may explain the uncoupling of the redox systems in the ER lumen, allowing parallel presence of a reduced pyridine nucleotide and a probably oxidized protein pool necessary for cellular viability.
Collapse
Affiliation(s)
- Krisztina Veszelyi
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Ibolya Czegle
- Department of Internal Medicine and Haematology, Semmelweis University, H-1085 Budapest, Hungary;
| | - Viola Varga
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Csilla Emese Németh
- Institute of Biochemistry and Molecular Biology, Department of Molecular Biology, Semmelweis University, H-1085 Budapest, Hungary;
| | - Balázs Besztercei
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Éva Margittai
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| |
Collapse
|
32
|
Nirdosh, Shukla H, Mishra S. An ApiAp2 Transcription Factor with a Dispensable Role in Plasmodium berghei Life Cycle. ACS Infect Dis 2024; 10:1904-1913. [PMID: 38752809 DOI: 10.1021/acsinfecdis.4c00240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Malaria parasites have a complex life cycle and undergo replication and population expansion within vertebrate hosts and mosquito vectors. These developmental transitions rely on changes in gene expression and chromatin reorganization that result in the activation and silencing of stage-specific genes. The ApiAp2 family of DNA-binding proteins plays an important role in regulating gene expression in malaria parasites. Here, we characterized the ApiAp2 protein in Plasmodium berghei, which we termed Ap2-D. In silico analysis revealed that Ap2-D has three beta-sheets followed by a helix at the C-terminus for DNA binding. Using gene tagging with 3XHA-mCherry, we found that Ap2-D is expressed in Plasmodium blood stages and is present in the parasite cytoplasm and nucleus. Surprisingly, our gene deletion study revealed a completely dispensable role for Ap2-D in the entirety of the P. berghei life cycle. Ap2-D KO parasites were found to grow in the blood successfully and progress through the mosquito midgut and salivary glands. Sporozoites isolated from mosquito salivary glands were infective for hepatocytes and achieved similar patency as WT in mice. We emphasize the importance of genetic validation of antimalarial drug targets before progressing them to drug discovery.
Collapse
Affiliation(s)
- Nirdosh
- Division of Molecular Microbiology and Immunology, CSIR-Central Drug Research Institute, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Himadri Shukla
- Division of Molecular Microbiology and Immunology, CSIR-Central Drug Research Institute, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Satish Mishra
- Division of Molecular Microbiology and Immunology, CSIR-Central Drug Research Institute, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
33
|
Ramos L. Dimorphic Regulation of the MafB Gene by Sex Steroids in Hamsters, Mesocricetus auratus. Animals (Basel) 2024; 14:1728. [PMID: 38929347 PMCID: PMC11200555 DOI: 10.3390/ani14121728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 05/29/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open
Abstract
MafB is a transcription factor that regulates macrophage differentiation. Macrophages are a traditional feature of the hamster Harderian gland (HG); however, studies pertaining to MafB expression in the HG are scant. Here, the full-length cDNA of the MafB gene in hamsters was cloned and sequenced. Molecular characterization revealed that MafB encodes a protein containing 323 amino acids with a DNA-binding domain, a transactivation domain, and a leucine zipper domain. qPCR assays indicated that MafB was expressed in different tissues of both sexes. The highest relative expression levels in endocrine tissues were identified in the pancreas. Gonadectomy in male hamsters was associated with significantly higher mRNA levels in the HG; replacement with dihydrotestosterone restored mRNA expression. The HG in male hamsters contained twofold more MafB mRNA than the HG of female hamsters. Adrenals revealed similar mRNA relative expression levels during the estrous cycle. The estrous phase was associated with higher mRNA levels in the ovary. A significantly up-regulated expression and sexual dimorphism of MafB was found in the pancreas. Therefore, MafB in the HG may play an active role in the macrophage differentiation required for phagocytosis activity and intraocular repair. Additionally, sex steroids appear to strongly influence the MafB expression in the HG and pancreas. These studies highlight the probable biological importance of MafB in immunological defense and pancreatic β cell regulation.
Collapse
Affiliation(s)
- Luis Ramos
- Department of Reproductive Biology, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, México City 14080, Mexico
| |
Collapse
|
34
|
Busch A, Gerbracht JV, Davies K, Hoecker U, Hess S. Comparative transcriptomics elucidates the cellular responses of an aeroterrestrial zygnematophyte to UV radiation. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:3624-3642. [PMID: 38520340 PMCID: PMC11156808 DOI: 10.1093/jxb/erae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/22/2024] [Indexed: 03/25/2024]
Abstract
The zygnematophytes are the closest relatives of land plants and comprise several lineages that adapted to a life on land. Species of the genus Serritaenia form colorful, mucilaginous capsules, which surround the cells and block harmful solar radiation, one of the major terrestrial stressors. In eukaryotic algae, this 'sunscreen mucilage' represents a unique photoprotective strategy, whose induction and chemical background are unknown. We generated a de novo transcriptome of Serritaenia testaceovaginata and studied its gene regulation under moderate UV radiation (UVR) that triggers sunscreen mucilage under experimental conditions. UVR induced the repair of DNA and the photosynthetic apparatus as well as the synthesis of aromatic specialized metabolites. Specifically, we observed pronounced expressional changes in the production of aromatic amino acids, phenylpropanoid biosynthesis genes, potential cross-membrane transporters of phenolics, and extracellular, oxidative enzymes. Interestingly, the most up-regulated enzyme was a secreted class III peroxidase, whose embryophyte homologs are involved in apoplastic lignin formation. Overall, our findings reveal a conserved, plant-like UVR perception system (UVR8 and downstream factors) in zygnematophyte algae and point to a polyphenolic origin of the sunscreen pigment of Serritaenia, whose synthesis might be extracellular and oxidative, resembling that of plant lignins.
Collapse
Affiliation(s)
- Anna Busch
- Department of Biology, University of Cologne, Zülpicher Str. 47b, D-50674 Cologne, Germany
| | - Jennifer V Gerbracht
- Department of Biology, University of Cologne, Zülpicher Str. 47b, D-50674 Cologne, Germany
| | - Kevin Davies
- The New Zealand Institute for Plant and Food Research Limited, Private Bag 11600, Palmerston North 4442, New Zealand
| | - Ute Hoecker
- Institute for Plant Sciences and Cluster of Excellence on Plant Sciences (CEPLAS), Biocenter, University of Cologne, Zülpicher Strasse 47b, D-50674, Cologne, Germany
| | - Sebastian Hess
- Department of Biology, University of Cologne, Zülpicher Str. 47b, D-50674 Cologne, Germany
| |
Collapse
|
35
|
Zhang C, Li S, Wang Y, Long J, Li X, Ke L, Xu R, Wu Z, Pi Z. Vernalization promotes bolting in sugar beet by inhibiting the transcriptional repressors of BvGI. PLANT MOLECULAR BIOLOGY 2024; 114:67. [PMID: 38836995 DOI: 10.1007/s11103-024-01460-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 04/26/2024] [Indexed: 06/06/2024]
Abstract
Sugar beet (Beta vulgaris L.), a biennial sugar crop, contributes about 16% of the world's sugar production. The transition from vegetative growth, during which sugar accumulated in beet, to reproductive growth, during which sugar exhausted in beet, is determined by vernalization and photoperiod. GIGANTEA (GI) is a key photoperiodic flowering gene that is induced by vernalization in sugar beet. To identify the upstream regulatory factors of BvGI, candidate transcription factors (TF) that were co-expressed with BvGI and could bind to the BvGI promoter were screened based on weighted gene co-expression network analysis (WGCNA) and TF binding site prediction. Subsequently, their transcriptional regulatory role on the BvGI was validated through subcellular localization, dual-luciferase assays and yeast transformation tests. A total of 7,586 differentially expressed genes were identified after vernalization and divided into 18 co-expression modules by WGCNA, of which one (MEcyan) and two (MEdarkorange2 and MEmidnightblue) modules were positively and negatively correlated with the expression of BvGI, respectively. TF binding site predictions using PlantTFDB enabled the screening of BvLHY, BvTCP4 and BvCRF4 as candidate TFs that negatively regulated the expression of BvGI by affecting its transcription. Subcellular localization showed that BvLHY, BvTCP4 and BvCRF4 were localized to the nucleus. The results of dual-luciferase assays and yeast transformation tests showed that the relative luciferase activity and expression of HIS3 was reduced in the BvLHY, BvTCP4 and BvCRF4 transformants, which suggested that the three TFs inhibited the BvGI promoter. In addition, real-time quantitative reverse transcription PCR showed that BvLHY and BvTCP4 exhibited rhythmic expression characteristics similar to that of BvGI, while BvCRF4 did not. Our results revealed that vernalization crosstalked with the photoperiod pathway to initiate bolting in sugar beet by inhibiting the transcriptional repressors of BvGI.
Collapse
Affiliation(s)
- Chunxue Zhang
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Shengnan Li
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Yuguang Wang
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Jiali Long
- College of Life Sciences, Heilongjiang University, 150080, Harbin, China
| | - Xinru Li
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Lixun Ke
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Rui Xu
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China
| | - Zedong Wu
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China.
| | - Zhi Pi
- Academy of Modern Agriculture and Ecological Environment, Heilongjiang University, 150080, Harbin, China.
| |
Collapse
|
36
|
Gabr A, Stephens TG, Reinfelder JR, Liau P, Calatrava V, Grossman AR, Bhattacharya D. Evidence of a putative CO 2 delivery system to the chromatophore in the photosynthetic amoeba Paulinella. ENVIRONMENTAL MICROBIOLOGY REPORTS 2024; 16:e13304. [PMID: 38923306 PMCID: PMC11194058 DOI: 10.1111/1758-2229.13304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024]
Abstract
The photosynthetic amoeba, Paulinella provides a recent (ca. 120 Mya) example of primary plastid endosymbiosis. Given the extensive data demonstrating host lineage-driven endosymbiont integration, we analysed nuclear genome and transcriptome data to investigate mechanisms that may have evolved in Paulinella micropora KR01 (hereinafter, KR01) to maintain photosynthetic function in the novel organelle, the chromatophore. The chromatophore is of α-cyanobacterial provenance and has undergone massive gene loss due to Muller's ratchet, but still retains genes that encode the ancestral α-carboxysome and the shell carbonic anhydrase, two critical components of the biophysical CO2 concentrating mechanism (CCM) in cyanobacteria. We identified KR01 nuclear genes potentially involved in the CCM that arose via duplication and divergence and are upregulated in response to high light and downregulated under elevated CO2. We speculate that these genes may comprise a novel CO2 delivery system (i.e., a biochemical CCM) to promote the turnover of the RuBisCO carboxylation reaction and counteract photorespiration. We posit that KR01 has an inefficient photorespiratory system that cannot fully recycle the C2 product of RuBisCO oxygenation back to the Calvin-Benson cycle. Nonetheless, both these systems appear to be sufficient to allow Paulinella to persist in environments dominated by faster-growing phototrophs.
Collapse
Affiliation(s)
- Arwa Gabr
- Graduate Program in Molecular Bioscience and Program in Microbiology and Molecular GeneticsRutgers UniversityNew BrunswickNew JerseyUSA
| | - Timothy G. Stephens
- Department of Biochemistry and MicrobiologyRutgers UniversityNew BrunswickNew JerseyUSA
| | - John R. Reinfelder
- Department of Environmental SciencesRutgers UniversityNew BrunswickNew JerseyUSA
| | - Pinky Liau
- Department of Biochemistry and MicrobiologyRutgers UniversityNew BrunswickNew JerseyUSA
| | - Victoria Calatrava
- Department of Plant BiologyThe Carnegie Institution for ScienceStanfordCaliforniaUSA
| | - Arthur R. Grossman
- Department of Plant BiologyThe Carnegie Institution for ScienceStanfordCaliforniaUSA
| | | |
Collapse
|
37
|
Yu H, Luo X. ThermoFinder: A sequence-based thermophilic proteins prediction framework. Int J Biol Macromol 2024; 270:132469. [PMID: 38761901 DOI: 10.1016/j.ijbiomac.2024.132469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]
Abstract
Thermophilic proteins are important for academic research and industrial processes, and various computational methods have been developed to identify and screen them. However, their performance has been limited due to the lack of high-quality labeled data and efficient models for representing protein. Here, we proposed a novel sequence-based thermophilic proteins prediction framework, called ThermoFinder. The results demonstrated that ThermoFinder outperforms previous state-of-the-art tools on two benchmark datasets, and feature ablation experiments confirmed the effectiveness of our approach. Additionally, ThermoFinder exhibited exceptional performance and consistency across two newly constructed datasets, one of these was specifically constructed for the regression-based prediction of temperature optimum values directly derived from protein sequences. The feature importance analysis, using shapley additive explanations, further validated the advantages of ThermoFinder. We believe that ThermoFinder will be a valuable and comprehensive framework for predicting thermophilic proteins, and we have made our model open source and available on Github at https://github.com/Luo-SynBioLab/ThermoFinder.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
38
|
Wilkinson P, Jackson B, Fermor H, Davies R. A new mRNA structure prediction based approach to identifying improved signal peptides for bone morphogenetic protein 2. BMC Biotechnol 2024; 24:34. [PMID: 38783306 PMCID: PMC11112908 DOI: 10.1186/s12896-024-00858-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND Signal peptide (SP) engineering has proven able to improve production of many proteins yet is a laborious process that still relies on trial and error. mRNA structure around the translational start site is important in translation initiation and has rarely been considered in this context, with recent improvements in in silico mRNA structure potentially rendering it a useful predictive tool for SP selection. Here we attempt to create a method to systematically screen candidate signal peptide sequences in silico based on both their nucleotide and amino acid sequences. Several recently released computational tools were used to predict signal peptide activity (SignalP), localization target (DeepLoc) and predicted mRNA structure (MXFold2). The method was tested with Bone Morphogenetic Protein 2 (BMP2), an osteogenic growth factor used clinically for bone regeneration. It was hoped more effective BMP2 SPs could improve BMP2-based gene therapies and reduce the cost of recombinant BMP2 production. RESULTS Amino acid sequence analysis indicated 2,611 SPs from the TGF-β superfamily were predicted to function when attached to BMP2. mRNA structure prediction indicated structures at the translational start site were likely highly variable. The five sequences with the most accessible translational start sites, a codon optimized BMP2 SP variant and the well-established hIL2 SP sequence were taken forward to in vitro testing. The top five candidates showed non-significant improvements in BMP2 secretion in HEK293T cells. All showed reductions in secretion versus the native sequence in C2C12 cells, with several showing large and significant decreases. None of the tested sequences were able to increase alkaline phosphatase activity above background in C2C12s. The codon optimized control sequence and hIL2 SP showed reasonable activity in HEK293T but very poor activity in C2C12. CONCLUSIONS These results support the use of peptide sequence based in silico tools for basic predictions around signal peptide activity in a synthetic biology context. However, mRNA structure prediction requires improvement before it can produce reliable predictions for this application. The poor activity of the codon optimized BMP2 SP variant in C2C12 emphasizes the importance of codon choice, mRNA structure, and cellular context for SP activity.
Collapse
Affiliation(s)
- Piers Wilkinson
- Department of Mechanical Engineering, Institute of Medical and Biological Engineering, University of Leeds, Leeds, UK.
| | - Brian Jackson
- Faculty of Biological Sciences, University of Leeds, Leeds, UK
| | - Hazel Fermor
- School of Biomedical Sciences, Faculty of Biological Sciences, University of Leeds, Leeds, UK
| | - Robert Davies
- Oral Biology, Faculty of Medicine and Health, University of Leeds, Leeds, UK
| |
Collapse
|
39
|
Gillani M, Pollastri G. SCLpred-ECL: Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks. Int J Mol Sci 2024; 25:5440. [PMID: 38791479 PMCID: PMC11121631 DOI: 10.3390/ijms25105440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/09/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
The subcellular location of a protein provides valuable insights to bioinformaticians in terms of drug designs and discovery, genomics, and various other aspects of medical research. Experimental methods for protein subcellular localization determination are time-consuming and expensive, whereas computational methods, if accurate, would represent a much more efficient alternative. This article introduces an ab initio protein subcellular localization predictor based on an ensemble of Deep N-to-1 Convolutional Neural Networks. Our predictor is trained and tested on strict redundancy-reduced datasets and achieves 63% accuracy for the diverse number of classes. This predictor is a step towards bridging the gap between a protein sequence and the protein's function. It can potentially provide information about protein-protein interaction to facilitate drug design and processes like vaccine production that are essential to disease prevention.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), D04 V1W8 Dublin, Ireland;
| | | |
Collapse
|
40
|
Rocha AL, Pai V, Perkins G, Chang T, Ma J, De Souza EV, Chu Q, Vaughan JM, Diedrich JK, Ellisman MH, Saghatelian A. An Inner Mitochondrial Membrane Microprotein from the SLC35A4 Upstream ORF Regulates Cellular Metabolism. J Mol Biol 2024; 436:168559. [PMID: 38580077 PMCID: PMC11292582 DOI: 10.1016/j.jmb.2024.168559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 03/29/2024] [Accepted: 03/31/2024] [Indexed: 04/07/2024]
Abstract
Upstream open reading frames (uORFs) are cis-acting elements that can dynamically regulate the translation of downstream ORFs by suppressing downstream translation under basal conditions and, in some cases, increasing downstream translation under stress conditions. Computational and empirical methods have identified uORFs in the 5'-UTRs of approximately half of all mouse and human transcripts, making uORFs one of the largest regulatory elements known. Because the prevailing dogma was that eukaryotic mRNAs produce a single functional protein, the peptides and small proteins, or microproteins, encoded by uORFs were rarely studied. We hypothesized that a uORF in the SLC35A4 mRNA is producing a functional microprotein (SLC35A4-MP) because of its conserved amino acid sequence. Through a series of biochemical and cellular experiments, we find that the 103-amino acid SLC35A4-MP is a single-pass transmembrane inner mitochondrial membrane (IMM) microprotein. The IMM contains the protein machinery crucial for cellular respiration and ATP generation, and loss of function studies with SLC35A4-MP significantly diminish maximal cellular respiration, indicating a vital role for this microprotein in cellular metabolism. The findings add SLC35A4-MP to the growing list of functional microproteins and, more generally, indicate that uORFs that encode conserved microproteins are an untapped reservoir of functional microproteins.
Collapse
Affiliation(s)
- Andréa L Rocha
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Victor Pai
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Guy Perkins
- National Center for Microscopy and Imaging Research, Center for Research in Biological Systems, Department of Neurosciences, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Tina Chang
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Jiao Ma
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Eduardo V De Souza
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Qian Chu
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Joan M Vaughan
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Jolene K Diedrich
- Mass Spectrometry Core for Proteomics and Metabolomics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| | - Mark H Ellisman
- National Center for Microscopy and Imaging Research, Center for Research in Biological Systems, Department of Neurosciences, School of Medicine, University of California San Diego, La Jolla, CA, USA.
| | - Alan Saghatelian
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA.
| |
Collapse
|
41
|
Zhou B, Zheng L, Wu B, Tan Y, Lv O, Yi K, Fan G, Hong L. Protein Engineering with Lightweight Graph Denoising Neural Networks. J Chem Inf Model 2024; 64:3650-3661. [PMID: 38630581 DOI: 10.1021/acs.jcim.4c00036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Protein engineering faces challenges in finding optimal mutants from a massive pool of candidate mutants. In this study, we introduce a deep-learning-based data-efficient fitness prediction tool to steer protein engineering. Our methodology establishes a lightweight graph neural network scheme for protein structures, which efficiently analyzes the microenvironment of amino acids in wild-type proteins and reconstructs the distribution of the amino acid sequences that are more likely to pass natural selection. This distribution serves as a general guidance for scoring proteins toward arbitrary properties on any order of mutations. Our proposed solution undergoes extensive wet-lab experimental validation spanning diverse physicochemical properties of various proteins, including fluorescence intensity, antigen-antibody affinity, thermostability, and DNA cleavage activity. More than 40% of ProtLGN-designed single-site mutants outperform their wild-type counterparts across all studied proteins and targeted properties. More importantly, our model can bypass the negative epistatic effect to combine single mutation sites and form deep mutants with up to seven mutation sites in a single round, whose physicochemical properties are significantly improved. This observation provides compelling evidence of the structure-based model's potential to guide deep mutations in protein engineering. Overall, our approach emerges as a versatile tool for protein engineering, benefiting both the computational and bioengineering communities.
Collapse
Affiliation(s)
- Bingxin Zhou
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Banghao Wu
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Tan
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Outongyi Lv
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Kai Yi
- School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 201203, China
| |
Collapse
|
42
|
El Faqer A, Rabeh K, Alami M, Filali-Maltouf A, Belkadi B. In Silico Identification and Characterization of Fatty Acid Desaturase ( FAD) Genes in Argania spinosa L. Skeels: Implications for Oil Quality and Abiotic Stress. Bioinform Biol Insights 2024; 18:11779322241248908. [PMID: 38711943 PMCID: PMC11072076 DOI: 10.1177/11779322241248908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/04/2024] [Indexed: 05/08/2024] Open
Abstract
Fatty acid desaturase (FAD) is the key enzyme that leads to the formation of unsaturated fatty acids by introducing double bonds into hydrocarbon chains, and it plays a critical role in plant lipid metabolism. However, no data are available on enzyme-associated genes in argan trees. In addition, a candidate gene approach was adopted to identify and characterize the gene sequences of interest that are potentially involved in oil quality and abiotic stress. Based on phylogenetic analyses, 18 putative FAD genes of Argania spinosa L. (AsFAD) were identified and assigned to three subfamilies: stearoyl-ACP desaturase (SAD), Δ-12 desaturase (FAD2/FAD6), and Δ-15 desaturase (FAD3/FAD7). Furthermore, gene structure and motif analyses revealed a conserved exon-intron organization among FAD members belonging to the various oil crops studied, and they exhibited conserved motifs within each subfamily. In addition, the gene structure shows a wide variation in intron numbers, ranging from 0 to 8, with two highly conserved intron phases (0 and 1). The AsFAD and AsSAD subfamilies consist of three (H(X)2-4H, H(X)2-3HH, and H/Q (X)2-3HH) and two (EEN(K)RHG and DEKRHE) conserved histidine boxes, respectively. A set of primer pairs were designed for each FAD gene, and tested on DNA extracted from argan leaves, in which all amplicons of the expected size were produced. These findings of candidate genes in A spinosa L. will provide valuable knowledge that further enhances our understanding of the potential roles of FAD genes in the quality of oil and abiotic stress in the argan tree.
Collapse
Affiliation(s)
- Abdelmoiz El Faqer
- Team of Microbiology and Molecular Biology, Plant and Microbial Biotechnology, Biodiversity and Environment Research Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Karim Rabeh
- Team of Microbiology and Molecular Biology, Plant and Microbial Biotechnology, Biodiversity and Environment Research Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Mohammed Alami
- Team of Microbiology and Molecular Biology, Plant and Microbial Biotechnology, Biodiversity and Environment Research Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Abdelkarim Filali-Maltouf
- Team of Microbiology and Molecular Biology, Plant and Microbial Biotechnology, Biodiversity and Environment Research Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Bouchra Belkadi
- Team of Microbiology and Molecular Biology, Plant and Microbial Biotechnology, Biodiversity and Environment Research Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
| |
Collapse
|
43
|
Wang Y, Zou B, Zhang Y, Zhang J, Li S, Yu B, An Z, Li L, Cui S, Zhang Y, Yao J, Shi X, Liu J. Comprehensive Long-Read Sequencing Analysis Discloses the Transcriptome Features of Papillary Thyroid Microcarcinoma. J Clin Endocrinol Metab 2024; 109:1263-1274. [PMID: 38038628 DOI: 10.1210/clinem/dgad695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/14/2023] [Accepted: 11/27/2023] [Indexed: 12/02/2023]
Abstract
CONTEXT Papillary thyroid microcarcinoma (PTMC) is the most common type of thyroid cancer. It has been shown that lymph node metastasis is associated with poor prognosis in patients with PTMC. OBJECTIVE We aim to characterize the PTMC transcriptome landscape and identify the candidate transcripts that are associated with lateral neck lymph node metastasis of PTMC. METHODS We performed full-length transcriptome sequencing in 64 PTMC samples. Standard bioinformatic pipelines were applied to characterize and annotate the full-length expression profiles of 2 PTMC subtypes. Functional open reading frame (ORF) annotation of the known and novel transcripts were predicted by HMMER, DeepLoc, and DeepTMHMM tools. Candidate transcripts associated with the pN1b subtype were identified after transcript quantification and differential gene expression analyses. RESULTS We found that skipping exons accounted for the more than 27.82% of the alternative splicing events. At least 42.56% of the discovered transcripts were novel isoforms of annotated genes. A total of 39 193 ORFs in novel transcripts and 18 596 ORFs in known transcripts were identified. Distribution patterns of the characterized transcripts in functional domain, subcellular localization, and transmembrane structure were predicted. In total, 1033 and 1204 differentially expressed genes were identified in the pN0 and pN1b groups, respectively. Moreover, novel isoforms of FRMD3, NOD1, and SHROOM4 were highlighted for their association with pN1b subtype. CONCLUSION Our data provided the global transcriptome landscape of PTMC and also revealed the novel isoforms that associated with PTMC aggressiveness.
Collapse
Affiliation(s)
- Yanqiang Wang
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Binbin Zou
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Yanyan Zhang
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Jin Zhang
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Shujing Li
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Bo Yu
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Zhekun An
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Lei Li
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Siqian Cui
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Yutong Zhang
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Jiali Yao
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Xiuzhi Shi
- Key Laboratory of Cellular Physiology of the Ministry of Education (Shanxi Medical University), Translational Medicine Research Center, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi 030001, China
| | - Jing Liu
- Department of Thyroid Surgery, First Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, China
| |
Collapse
|
44
|
Higdon AL, Won NH, Brar GA. Truncated protein isoforms generate diversity of protein localization and function in yeast. Cell Syst 2024; 15:388-408.e4. [PMID: 38636458 PMCID: PMC11075746 DOI: 10.1016/j.cels.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 01/21/2024] [Accepted: 03/20/2024] [Indexed: 04/20/2024]
Abstract
Genome-wide measurement of ribosome occupancy on mRNAs has enabled empirical identification of translated regions, but high-confidence detection of coding regions that overlap annotated coding regions has remained challenging. Here, we report a sensitive and robust algorithm that revealed the translation of 388 N-terminally truncated proteins in budding yeast-more than 30-fold more than previously known. We extensively experimentally validated them and defined two classes. The first class lacks large portions of the annotated protein and tends to be produced from a truncated transcript. We show that two such cases, Yap5truncation and Pus1truncation, have condition-specific regulation and distinct functions from their respective annotated isoforms. The second class of truncated protein isoforms lacks only a small region of the annotated protein and is less likely to be produced from an alternative transcript isoform. Many display different subcellular localizations than their annotated counterpart, representing a common strategy for dual localization of otherwise functionally identical proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Andrea L Higdon
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nathan H Won
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Gloria A Brar
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| |
Collapse
|
45
|
Casuso A, Benavente BP, Leal Y, Carrera-Naipil C, Valenzuela-Muñoz V, Gallardo-Escárate C. Sex-Biased Transcription Expression of Vitellogenins Reveals Fusion Gene and MicroRNA Regulation in the Sea Louse Caligus rogercresseyi. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2024; 26:243-260. [PMID: 38294574 DOI: 10.1007/s10126-024-10291-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/17/2024] [Indexed: 02/01/2024]
Abstract
The caligid ectoparasite, Caligus rogercresseyi, is one of the main concerns in the Chilean salmon industry. The molecular mechanisms displayed by the parasite during the reproductive process represent an opportunity for developing novel control strategies. Vitellogenin is a multifunctional protein recognized as a critical player in several crustaceans' biological processes, including reproduction, embryonic development, and immune response. This study aimed to characterize the C. rogercresseyi vitellogenins, including discovering novel transcripts and regulatory mechanisms associated with microRNAs. Herein, vitellogenin genes were identified by homology analysis using the reference sea louse genome, transcriptome database, and arthropods vitellogenin-protein database. The validation of expression transcripts was conducted by RNA nanopore sequencing technology. Moreover, fusion gene profiling, miRNA target analysis, and functional validation were performed using luciferase assay. Six putative vitellogenin genes were identified in the C. rogercresseyi genome with high homology with other copepods vitellogenins. Furthermore, miR-996 showed a putative role in regulating the Cr_Vitellogenin1 gene, which is highly expressed in females. Moreover, vitellogenin-fusion genes were identified in adult stages and highly regulated in males, demonstrating sex-related expression patterns. In females, the identified fusion genes merged with several non-vitellogenin genes involved in biological processes of ribosome assembly, BMP signaling pathway, and biosynthetic processes. This study reports the genome array of vitellogenins in C. rogercresseyi for the first time, revealing the putative role of fusion genes and miRNA regulation in sea lice biology.
Collapse
Affiliation(s)
- Antonio Casuso
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Department of Oceanography, Universidad de Concepción, Concepción, Chile
| | - Bárbara P Benavente
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Department of Oceanography, Universidad de Concepción, Concepción, Chile
| | - Yeny Leal
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Department of Oceanography, Universidad de Concepción, Concepción, Chile
| | - Crisleri Carrera-Naipil
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile
| | - Valentina Valenzuela-Muñoz
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile
- Laboratory of Biotechnology and Aquatic Genomics, Department of Oceanography, Universidad de Concepción, Concepción, Chile
| | - Cristian Gallardo-Escárate
- Interdisciplinary Center for Aquaculture Research (INCAR), Universidad de Concepción, Concepción, Chile.
- Laboratory of Biotechnology and Aquatic Genomics, Department of Oceanography, Universidad de Concepción, Concepción, Chile.
| |
Collapse
|
46
|
Yan Y, Li W, Wang S, Huang T. Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence. ACS OMEGA 2024; 9:12734-12742. [PMID: 38524500 PMCID: PMC10955590 DOI: 10.1021/acsomega.3c08381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 03/26/2024]
Abstract
RNA-binding proteins (RBPs) can interact with RNAs to regulate RNA translation, modification, splicing, and other important biological processes. The accurate identification of RBPs is of paramount importance for gaining insights into the intricate mechanisms underlying organismal life activities. Traditional experimental methods to predict RBPs require a lot of time and money, so it is important to develop computational methods to predict RBPs. However, the existing approaches for RBP prediction still require further improvement due to unidentified RBPs in many species. In this study, we present Seq-RBPPred (predicting RBPs from sequence), a novel method that utilizes a comprehensive feature representation encompassing both biophysical properties and hidden-state features derived from protein sequences. In the results, comprehensive performance evaluations of Seq-RBPPred its superiority compare with state-of-the-art methods, yielding impressive performance including 0.922 for overall accuracy, 0.926 for sensitivity, 0.903 for specificity, and Matthew's correlation coefficient (MCC) of 0.757 as ascertained from the evaluation of the testing set. The data and code of Seq-RBPPred are available at https://github.com/yaoyao-11/Seq-RBPPred.
Collapse
Affiliation(s)
- Yuyao Yan
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Wenran Li
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Sijia Wang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Tao Huang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| |
Collapse
|
47
|
León-García F, García-Laynes F, Estrada-Tapia G, Monforte-González M, Martínez-Estevez M, Echevarría-Machado I. In Silico Analysis of Glutamate Receptors in Capsicum chinense: Structure, Evolution, and Molecular Interactions. PLANTS (BASEL, SWITZERLAND) 2024; 13:812. [PMID: 38592787 PMCID: PMC10975470 DOI: 10.3390/plants13060812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 02/27/2024] [Accepted: 03/06/2024] [Indexed: 04/11/2024]
Abstract
Plant glutamate receptors (GLRs) are integral membrane proteins that function as non-selective cation channels, involved in the regulation of developmental events crucial in plants. Knowledge of these proteins is restricted to a few species and their true agonists are still unknown in plants. Using tomato SlGLRs, a search was performed in the pepper database to identify GLR sequences in habanero pepper (Capsicum chinense Jacq.). Structural, phylogenetic, and orthology analysis of the CcGLRs, as well as molecular docking and protein interaction networks, were conducted. Seventeen CcGLRs were identified, which contained the characteristic domains of GLR. The variation of conserved residues in the M2 transmembrane domain between members suggests a difference in ion selectivity and/or conduction. Also, new conserved motifs in the ligand-binding regions are reported. Duplication events seem to drive the expansion of the species, and these were located in the evolution by using orthologs. Molecular docking analysis allowed us to identify differences in the agonist binding pocket between CcGLRs, which suggest the existence of different affinities for amino acids. The possible interaction of some CcGLRs with proteins leads to suggesting specific functions for them within the plant. These results offer important functional clues for CcGLR, probably extrapolated to other Solanaceae.
Collapse
Affiliation(s)
| | | | | | | | | | - Ileana Echevarría-Machado
- Unidad de Biología Integrativa, Centro de Investigación Científica de Yucatán, Calle 43, #130, x 32 and 34, Mérida 97205, Yucatán, Mexico; (F.L.-G.); (M.M.-G.); (M.M.-E.)
| |
Collapse
|
48
|
Ektefaie Y, Shen A, Bykova D, Marin M, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581982. [PMID: 38464295 PMCID: PMC10925170 DOI: 10.1101/2024.02.25.581982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew Shen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Computer Science, Northwestern University, Evanston, IL, USA
| | - Daria Bykova
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Maximillian Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
49
|
Chen JY, Sang H, Chilvers MI, Wu CH, Chang HX. Characterization of soybean chitinase genes induced by rhizobacteria involved in the defense against Fusarium oxysporum. FRONTIERS IN PLANT SCIENCE 2024; 15:1341181. [PMID: 38405589 PMCID: PMC10884886 DOI: 10.3389/fpls.2024.1341181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/08/2024] [Indexed: 02/27/2024]
Abstract
Rhizobacteria are capable of inducing defense responses via the expression of pathogenesis-related proteins (PR-proteins) such as chitinases, and many studies have validated the functions of plant chitinases in defense responses. Soybean (Glycine max) is an economically important crop worldwide, but the functional validation of soybean chitinase in defense responses remains limited. In this study, genome-wide characterization of soybean chitinases was conducted, and the defense contribution of three chitinases (GmChi01, GmChi02, or GmChi16) was validated in Arabidopsis transgenic lines against the soil-borne pathogen Fusarium oxysporum. Compared to the Arabidopsis Col-0 and empty vector controls, the transgenic lines with GmChi02 or GmChi16 exhibited fewer chlorosis symptoms and wilting. While GmChi02 and GmChi16 enhanced defense to F. oxysporum, GmChi02 was the only one significantly induced by Burkholderia ambifaria. The observation indicated that plant chitinases may be induced by different rhizobacteria for defense responses. The survey of 37 soybean chitinase gene expressions in response to six rhizobacteria observed diverse inducibility, where only 10 genes were significantly upregulated by at least one rhizobacterium and 9 genes did not respond to any of the rhizobacteria. Motif analysis on soybean promoters further identified not only consensus but also rhizobacterium-specific transcription factor-binding sites for the inducible chitinase genes. Collectively, these results confirmed the involvement of GmChi02 and GmChi16 in defense enhancement and highlighted the diverse inducibility of 37 soybean chitinases encountering F. oxysporum and six rhizobacteria.
Collapse
Affiliation(s)
- Jheng-Yan Chen
- Department of Plant Pathology and Microbiology, National Taiwan University, Taipei, Taiwan
| | - Hyunkyu Sang
- Department of Integrative Food, Bioscience and Biotechnology, Chonnam National University, Gwangju, Republic of Korea
| | - Martin I. Chilvers
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States
| | - Chih-Hang Wu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Hao-Xun Chang
- Department of Plant Pathology and Microbiology, National Taiwan University, Taipei, Taiwan
- Master Program of Plant Medicine, National Taiwan University, Taipei, Taiwan
- Center of Biotechnology, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
50
|
Harada R, Hirakawa Y, Yabuki A, Kim E, Yazaki E, Kamikawa R, Nakano K, Eliáš M, Inagaki Y. Encyclopedia of Family A DNA Polymerases Localized in Organelles: Evolutionary Contribution of Bacteria Including the Proto-Mitochondrion. Mol Biol Evol 2024; 41:msae014. [PMID: 38271287 PMCID: PMC10877234 DOI: 10.1093/molbev/msae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/12/2024] [Accepted: 01/19/2024] [Indexed: 01/27/2024] Open
Abstract
DNA polymerases synthesize DNA from deoxyribonucleotides in a semiconservative manner and serve as the core of DNA replication and repair machinery. In eukaryotic cells, there are 2 genome-containing organelles, mitochondria, and plastids, which were derived from an alphaproteobacterium and a cyanobacterium, respectively. Except for rare cases of genome-lacking mitochondria and plastids, both organelles must be served by nucleus-encoded DNA polymerases that localize and work in them to maintain their genomes. The evolution of organellar DNA polymerases has yet to be fully understood because of 2 unsettled issues. First, the diversity of organellar DNA polymerases has not been elucidated in the full spectrum of eukaryotes. Second, it is unclear when the DNA polymerases that were used originally in the endosymbiotic bacteria giving rise to mitochondria and plastids were discarded, as the organellar DNA polymerases known to date show no phylogenetic affinity to those of the extant alphaproteobacteria or cyanobacteria. In this study, we identified from diverse eukaryotes 134 family A DNA polymerase sequences, which were classified into 10 novel types, and explored their evolutionary origins. The subcellular localizations of selected DNA polymerases were further examined experimentally. The results presented here suggest that the diversity of organellar DNA polymerases has been shaped by multiple transfers of the PolI gene from phylogenetically broad bacteria, and their occurrence in eukaryotes was additionally impacted by secondary plastid endosymbioses. Finally, we propose that the last eukaryotic common ancestor may have possessed 2 mitochondrial DNA polymerases, POP, and a candidate of the direct descendant of the proto-mitochondrial DNA polymerase I, rdxPolA, identified in this study.
Collapse
Affiliation(s)
- Ryo Harada
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Yoshihisa Hirakawa
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Akinori Yabuki
- Deep-Sea Biodiversity Research Group, Research Institute for Global Change (RIGC), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Japan
| | - Eunsoo Kim
- Division of EcoScience, Ewha Womans University, Seoul, South Korea
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Euki Yazaki
- Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Japan
- Interdisciplinary Theoretical and Mathematical Sciences program (iTHEMS), RIKEN, Wako, Saitama, Japan
| | - Ryoma Kamikawa
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | - Kentaro Nakano
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Marek Eliáš
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Yuji Inagaki
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
- Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|