101
|
Yan Y, Zheng J, Zhang X, Yin Y. dbAPIS: a database of anti-prokaryotic immune system genes. Nucleic Acids Res 2024; 52:D419-D425. [PMID: 37889074 PMCID: PMC10767833 DOI: 10.1093/nar/gkad932] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/20/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
Anti-prokaryotic immune system (APIS) proteins, typically encoded by phages, prophages, and plasmids, inhibit prokaryotic immune systems (e.g. restriction modification, toxin-antitoxin, CRISPR-Cas). A growing number of APIS genes have been characterized and dispersed in the literature. Here we developed dbAPIS (https://bcb.unl.edu/dbAPIS), as the first literature curated data repository for experimentally verified APIS genes and their associated protein families. The key features of dbAPIS include: (i) experimentally verified APIS genes with their protein sequences, functional annotation, PDB or AlphaFold predicted structures, genomic context, sequence and structural homologs from different microbiome/virome databases; (ii) classification of APIS proteins into sequence-based families and construction of hidden Markov models (HMMs); (iii) user-friendly web interface for data browsing by the inhibited immune system types or by the hosts, and functions for searching and batch downloading of pre-computed data; (iv) Inclusion of all types of APIS proteins (except for anti-CRISPRs) that inhibit a variety of prokaryotic defense systems (e.g. RM, TA, CBASS, Thoeris, Gabija). The current release of dbAPIS contains 41 verified APIS proteins and ∼4400 sequence homologs of 92 families and 38 clans. dbAPIS will facilitate the discovery of novel anti-defense genes and genomic islands in phages, by providing a user-friendly data repository and a web resource for an easy homology search against known APIS proteins.
Collapse
Affiliation(s)
- Yuchen Yan
- Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| | | | - Xinpeng Zhang
- Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| | - Yanbin Yin
- Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
102
|
Billman ZP, Kovacs SB, Wei B, Kang K, Cissé OH, Miao EA. Caspase-1 activates gasdermin A in non-mammals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.28.559989. [PMID: 37987010 PMCID: PMC10659411 DOI: 10.1101/2023.09.28.559989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Gasdermins oligomerize to form pores in the cell membrane, causing regulated lytic cell death called pyroptosis. Mammals encode five gasdermins that can trigger pyroptosis: GSDMA, B, C, D, and E. Caspase and granzyme proteases cleave the linker regions of and activate GSDMB, C, D, and E, but no endogenous activation pathways are yet known for GSDMA. Here, we perform a comprehensive evolutionary analysis of the gasdermin family. A gene duplication of GSDMA in the common ancestor of caecilian amphibians, reptiles and birds gave rise to GSDMA-D in mammals. Uniquely in our tree, amphibian, reptile and bird GSDMA group in a separate clade than mammal GSDMA. Remarkably, GSDMA in numerous bird species contain caspase-1 cleavage sites like YVAD or FASD in the linker. We show that GSDMA from birds, amphibians, and reptiles are all cleaved by caspase-1. Thus, GSDMA was originally cleaved by the host-encoded protease caspase-1. In mammals the caspase-1 cleavage site in GSDMA is disrupted; instead, a new protein, GSDMD, is the target of caspase-1. Mammal caspase-1 uses exosite interactions with the GSDMD C-terminal domain to confer the specificity of this interaction, whereas we show that bird caspase-1 uses a stereotypical tetrapeptide sequence to confer specificity for bird GSDMA. Our results reveal an evolutionarily stable association between caspase-1 and the gasdermin family, albeit a shifting one. Caspase-1 repeatedly changes its target gasdermin over evolutionary time at speciation junctures, initially cleaving GSDME in fish, then GSDMA in amphibians/reptiles/birds, and finally GSDMD in mammals.
Collapse
Affiliation(s)
- Zachary P Billman
- Duke University School of Medicine
- National Institutes of Health University of North Carolina at Chapel Hill
- Departments of: Integrative Immunobiology; Molecular Genetics and Microbiology; Cell Biology; Pathology; Durham, NC, USA
- Department of Microbiology and Immunology; Chapel Hill, NC, USA
| | - Stephen B Kovacs
- Duke University School of Medicine
- National Institutes of Health University of North Carolina at Chapel Hill
- Departments of: Integrative Immunobiology; Molecular Genetics and Microbiology; Cell Biology; Pathology; Durham, NC, USA
- Department of Microbiology and Immunology; Chapel Hill, NC, USA
| | - Bo Wei
- Duke University School of Medicine
- Departments of: Integrative Immunobiology; Molecular Genetics and Microbiology; Cell Biology; Pathology; Durham, NC, USA
| | - Kidong Kang
- Duke University School of Medicine
- Departments of: Integrative Immunobiology; Molecular Genetics and Microbiology; Cell Biology; Pathology; Durham, NC, USA
| | - Ousmane H Cissé
- National Institutes of Health
- Critical Care Medicine Department; Bethesda, MD, USA
| | - Edward A Miao
- Duke University School of Medicine
- National Institutes of Health University of North Carolina at Chapel Hill
- Departments of: Integrative Immunobiology; Molecular Genetics and Microbiology; Cell Biology; Pathology; Durham, NC, USA
| |
Collapse
|
103
|
Pang Y, Liu B. DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model. BMC Biol 2024; 22:3. [PMID: 38166858 PMCID: PMC10762911 DOI: 10.1186/s12915-023-01803-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Intrinsically disordered proteins and regions (IDPs/IDRs) are functionally important proteins and regions that exist as highly dynamic conformations under natural physiological conditions. IDPs/IDRs exhibit a broad range of molecular functions, and their functions involve binding interactions with partners and remaining native structural flexibility. The rapid increase in the number of proteins in sequence databases and the diversity of disordered functions challenge existing computational methods for predicting protein intrinsic disorder and disordered functions. A disordered region interacts with different partners to perform multiple functions, and these disordered functions exhibit different dependencies and correlations. In this study, we introduce DisoFLAG, a computational method that leverages a graph-based interaction protein language model (GiPLM) for jointly predicting disorder and its multiple potential functions. GiPLM integrates protein semantic information based on pre-trained protein language models into graph-based interaction units to enhance the correlation of the semantic representation of multiple disordered functions. The DisoFLAG predictor takes amino acid sequences as the only inputs and provides predictions of intrinsic disorder and six disordered functions for proteins, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker. We evaluated the predictive performance of DisoFLAG following the Critical Assessment of protein Intrinsic Disorder (CAID) experiments, and the results demonstrated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. The standalone package and web server of DisoFLAG have been established to provide accurate prediction tools for intrinsic disorders and their associated functions.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
| |
Collapse
|
104
|
Hussain A, Brooks III CL. Guiding discovery of protein sequence-structure-function modeling. Bioinformatics 2024; 40:btae002. [PMID: 38195719 PMCID: PMC10789314 DOI: 10.1093/bioinformatics/btae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/05/2023] [Accepted: 01/08/2024] [Indexed: 01/11/2024] Open
Abstract
MOTIVATION Protein engineering techniques are key in designing novel catalysts for a wide range of reactions. Although approaches vary in their exploration of the sequence-structure-function paradigm, they are often hampered by the labor-intensive steps of protein expression and screening. In this work, we describe the development and testing of a high-throughput in silico sequence-structure-function pipeline using AlphaFold2 and fast Fourier transform docking that is benchmarked with enantioselectivity and reactivity predictions for an ancestral sequence library of fungal flavin-dependent monooxygenases. RESULTS The predicted enantioselectivities and reactivities correlate well with previously described screens of an experimentally available subset of these proteins and capture known changes in enantioselectivity across the phylogenetic tree representing ancestorial proteins from this family. With this pipeline established as our functional screen, we apply ensemble decision tree models and explainable AI techniques to build sequence-function models and extract critical residues within the binding site and the second-sphere residues around this site. We demonstrate that the top-identified key residues in the control of enantioselectivity and reactivity correspond to experimentally verified residues. The in silico sequence-to-function pipeline serves as an accelerated framework to inform protein engineering efforts from vast informative sequence landscapes contained in protein families, ancestral resurrects, and directed evolution campaigns. AVAILABILITY Jupyter notebooks detailing the sequence-structure-function pipeline are available at https://github.com/BrooksResearchGroup-UM/seq_struct_func.
Collapse
Affiliation(s)
- Azam Hussain
- Department of Macromolecular Science and Engineering Program, University of Michigan, Ann Arbor, MI 48109-1055, United States
| | - Charles L Brooks III
- Department of Chemistry, University of Michigan, Ann Arbor, MI 48109-1055, United States
| |
Collapse
|
105
|
Pantolini L, Studer G, Pereira J, Durairaj J, Tauriello G, Schwede T. Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone. Bioinformatics 2024; 40:btad786. [PMID: 38175775 PMCID: PMC10792726 DOI: 10.1093/bioinformatics/btad786] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/27/2023] [Accepted: 12/29/2023] [Indexed: 01/06/2024] Open
Abstract
MOTIVATION Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.
Collapse
Affiliation(s)
- Lorenzo Pantolini
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Joana Pereira
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
106
|
Shen J, Yu Q, Chen S, Tan Q, Li J, Li Y. Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model. NATURE COMPUTATIONAL SCIENCE 2024; 4:29-42. [PMID: 38177492 DOI: 10.1038/s43588-023-00576-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/22/2023] [Indexed: 01/06/2024]
Abstract
Signal peptides (SPs) are essential to target and transfer transmembrane and secreted proteins to the correct positions. Many existing computational tools for predicting SPs disregard the extreme data imbalance problem and rely on additional group information of proteins. Here we introduce Unbiased Organism-agnostic Signal Peptide Network (USPNet), an SP classification and cleavage-site prediction deep learning method. Extensive experimental results show that USPNet substantially outperforms previous methods on classification performance by 10%. An SP-discovering pipeline with USPNet is designed to explore unprecedented SPs from metagenomic data. It reveals 347 SP candidates, with the lowest sequence identity between our candidates and the closest SP in the training dataset at only 13%. In addition, the template modeling scores between candidates and SPs in the training set are mostly above 0.8. The results showcase that USPNet has learnt the SP structure with raw amino acid sequences and the large protein language model, thereby enabling the discovery of unknown SPs.
Collapse
Affiliation(s)
- Junbo Shen
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, US
| | - Qinze Yu
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China
| | - Shenyang Chen
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Shenzhen, China
- Georgia Institute of Technology, Atlanta, GA, US
| | - Qingxiong Tan
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China
| | - Jingchen Li
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Shenzhen, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
107
|
Duan C, Liu Y, Liu Y, Liu L, Cai M, Zhang R, Zeng Q, Koonin EV, Krupovic M, Li M. Diversity of Bathyarchaeia viruses in metagenomes and virus-encoded CRISPR system components. ISME COMMUNICATIONS 2024; 4:ycad011. [PMID: 38328448 PMCID: PMC10848311 DOI: 10.1093/ismeco/ycad011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 12/14/2023] [Accepted: 12/18/2023] [Indexed: 02/09/2024]
Abstract
Bathyarchaeia represent a class of archaea common and abundant in sedimentary ecosystems. Here we report 56 metagenome-assembled genomes of Bathyarchaeia viruses identified in metagenomes from different environments. Gene sharing network and phylogenomic analyses led to the proposal of four virus families, including viruses of the realms Duplodnaviria and Adnaviria, and archaea-specific spindle-shaped viruses. Genomic analyses uncovered diverse CRISPR elements in these viruses. Viruses of the proposed family "Fuxiviridae" harbor an atypical Type IV-B CRISPR-Cas system and a Cas4 protein that might interfere with host immunity. Viruses of the family "Chiyouviridae" encode a Cas2-like endonuclease and two mini-CRISPR arrays, one with a repeat identical to that in the host CRISPR array, potentially allowing the virus to recruit the host CRISPR adaptation machinery to acquire spacers that could contribute to competition with other mobile genetic elements or to inhibit host defenses. These findings present an outline of the Bathyarchaeia virome and offer a glimpse into their counter-defense mechanisms.
Collapse
Affiliation(s)
- Changhai Duan
- SZU-HKUST Joint PhD Program in Marine Environmental Science, Shenzhen University, Shenzhen 518060, China
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Yang Liu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Ying Liu
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris 75015, France
| | - Lirui Liu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Mingwei Cai
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Rui Zhang
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Qinglu Zeng
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris 75015, France
| | - Meng Li
- SZU-HKUST Joint PhD Program in Marine Environmental Science, Shenzhen University, Shenzhen 518060, China
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
108
|
Zhou Y, Wang Y, Prangishvili D, Krupovic M. Exploring the Archaeal Virosphere by Metagenomics. Methods Mol Biol 2024; 2732:1-22. [PMID: 38060114 DOI: 10.1007/978-1-0716-3515-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
During the past decade, environmental research has demonstrated that archaea are abundant and widespread in nature and play important ecological roles at a global scale. Currently, however, the majority of archaeal lineages cannot be cultivated under laboratory conditions and are known exclusively or nearly exclusively through metagenomics. A similar trend extends to the archaeal virosphere, where isolated representatives are available for a handful of model archaeal virus-host systems. Viral metagenomics provides an alternative way to circumvent the limitations of culture-based virus discovery and offers insight into the diversity, distribution, and environmental impact of uncultured archaeal viruses. Presently, metagenomics approaches have been successfully applied to explore the viromes associated with various lineages of extremophilic and mesophilic archaea, including Asgard archaea (Asgardarchaeota), ANME-1 archaea (Methanophagales), thaumarchaea (Nitrososphaeria), altiarchaea (Altiarchaeota), and marine group II archaea (Poseidoniales). Here, we provide an overview of methods widely used in archaeal virus metagenomics, covering metavirome preparation, genome annotation, phylogenetic and phylogenomic analyses, and archaeal host assignment. We hope that this summary will contribute to further exploration and characterization of the enigmatic archaeal virome lurking in diverse environments.
Collapse
Affiliation(s)
- Yifan Zhou
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris, France
- Sorbonne Université, Collège Doctoral, Paris, France
| | - Yongjie Wang
- College of Food Science and Technology, Shanghai Ocean University, Shanghai, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
- Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), Ministry of Agriculture, Shanghai, China
| | - David Prangishvili
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris, France
- Ivane Javakhishvili Tbilisi State University, Tbilisi, Georgia
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris, France.
| |
Collapse
|
109
|
Kinch LN, Schaeffer RD, Zhang J, Cong Q, Orth K, Grishin N. Insights into virulence: structure classification of the Vibrio parahaemolyticus RIMD mobilome. mSystems 2023; 8:e0079623. [PMID: 38014954 PMCID: PMC10734457 DOI: 10.1128/msystems.00796-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 10/17/2023] [Indexed: 11/29/2023] Open
Abstract
IMPORTANCE The pandemic Vpar strain RIMD causes seafood-borne illness worldwide. Previous comparative genomic studies have revealed pathogenicity islands in RIMD that contribute to the success of the strain in infection. However, not all virulence determinants have been identified, and many of the proteins encoded in known pathogenicity islands are of unknown function. Based on the EOCD database, we used evolution-based classification of structure models for the RIMD proteome to improve our functional understanding of virulence determinants acquired by the pandemic strain. We further identify and classify previously unknown mobile protein domains as well as fast evolving residue positions in structure models that contribute to virulence and adaptation with respect to a pre-pandemic strain. Our work highlights key contributions of phage in mediating seafood born illness, suggesting this strain balances its avoidance of phage predators with its successful colonization of human hosts.
Collapse
Affiliation(s)
- Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Kim Orth
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Nick Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
110
|
Huang C, Luo H, Zeng B, Feng C, Chen J, Yuan H, Huang S, Yang B, Zou Y, Liu Y. Identification of two novel and one rare mutation in DYRK1A and prenatal diagnoses in three Chinese families with intellectual Disability-7. Front Genet 2023; 14:1290949. [PMID: 38179410 PMCID: PMC10765505 DOI: 10.3389/fgene.2023.1290949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 12/07/2023] [Indexed: 01/06/2024] Open
Abstract
Background and purpose: Intellectual disability-7 (MRD7) is a subtype disorder of intellectual disability (MRD) involving feeding difficulties, hypoactivity, and febrile seizures at an age of early onset, then progressive intellectual and physical development deterioration. We purposed to identify the underlying causative genetic factors of three individuals in each Chinese family who presented with symptoms of intellectual disability and facial dysmorphic features. We provided prenatal diagnosis for the three families and genetic counseling for the prevention of this disease. Methods: We collected retrospective clinical diagnostic evidence for the three probands in our study, which included magnetic resonance imaging (MRI), computerized tomography (CT), electroencephalogram (EEG), and intelligence tests for the three probands in our study. Genetic investigation of the probands and their next of kin was performed by Trio-whole exome sequencing (WES). Sanger sequencing or quantitative PCR technologies were then used as the next step to verify the variants confirmed with Trio-WES for the three families. Moreover, we performed amniocentesis to explore the state of the three pathogenic variants in the fetuses by prenatal molecular genetic diagnosis at an appropriate gestational period for the three families. Results: The three probands and one fetus were clinically diagnosed with microcephaly and exhibited intellectual developmental disability, postnatal feeding difficulties, and facial dysmorphic features. Combining probands' clinical manifestations, Trio-WES uncovered the three heterozygous variants in DYRK1A: a novel variant exon3_exon4del p.(Gly4_Asn109del), a novel variant c.1159C>T p.(Gln387*), and a previously presented but rare pathogenic variant c.1309C>T p.(Arg437*) (NM_001396.5) in three families, respectively. In light of the updated American College of Medical Genetic and Genomics (ACMG) criterion, the variant of exon3_exon4del and c.1159C>T were both classified as likely pathogenic (PSV1+PM6), while c1309C>T was identified as pathogenic (PVS1+PS2_Moderate+PM2). Considering clinical features and molecular testimony, the three probands were confirmed diagnosed with MRD7. These three discovered variants were considered as the three causal mutations for MRD7. Prenatal diagnosis detected the heterozygous dominant variant of c.1159C>T p.(Gln387*) in one of the fetuses, indicating a significant probability of MRD7, subsequently the gestation was intervened by the parents' determination and professional obstetrical operation. On the other side, prenatal molecular genetic testing revealed wild-type alleles in the other two fetuses, and their parents both decided to sustain the gestation. Conclusion: We identified two novel and one rare mutation in DYRK1A which has broadened the spectrum of DYRK1A and provided evidence for the diagnosis of MRD7 at the molecular level. Besides, this study has supported the three families with MRD7 to determine the causative genetic factors efficiently and provide concise genetic counseling for the three families by using Trio-WES technology.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Bicheng Yang
- Department of Medical Genetics, Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Yongyi Zou
- Department of Medical Genetics, Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Yanqiu Liu
- Department of Medical Genetics, Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| |
Collapse
|
111
|
Muzyukina P, Shkaruta A, Guzman NM, Andreani J, Borges AL, Bondy-Denomy J, Maikova A, Semenova E, Severinov K, Soutourina O. Identification of an anti-CRISPR protein that inhibits the CRISPR-Cas type I-B system in Clostridioides difficile. mSphere 2023; 8:e0040123. [PMID: 38009936 PMCID: PMC10732046 DOI: 10.1128/msphere.00401-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/10/2023] [Indexed: 11/29/2023] Open
Abstract
IMPORTANCE Clostridioides difficile is the widespread anaerobic spore-forming bacterium that is a major cause of potentially lethal nosocomial infections associated with antibiotic therapy worldwide. Due to the increase in severe forms associated with a strong inflammatory response and higher recurrence rates, a current imperative is to develop synergistic and alternative treatments for C. difficile infections. In particular, phage therapy is regarded as a potential substitute for existing antimicrobial treatments. However, it faces challenges because C. difficile has highly active CRISPR-Cas immunity, which may be a specific adaptation to phage-rich and highly crowded gut environment. To overcome this defense, C. difficile phages must employ anti-CRISPR mechanisms. Here, we present the first anti-CRISPR protein that inhibits the CRISPR-Cas defense system in this pathogen. Our work offers insights into the interactions between C. difficile and its phages, paving the way for future CRISPR-based applications and development of effective phage therapy strategies combined with the engineering of virulent C. difficile infecting phages.
Collapse
Affiliation(s)
- Polina Muzyukina
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
- Center for Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Anton Shkaruta
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
- Center for Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Noemi M. Guzman
- Center for Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
- Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, Alicante, Spain
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
| | - Adair L. Borges
- Department of Microbiology and Immunology, University of California, San Francisco, California, USA
| | - Joseph Bondy-Denomy
- Department of Microbiology and Immunology, University of California, San Francisco, California, USA
| | - Anna Maikova
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
- Center for Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Ekaterina Semenova
- Waksman Institute, Rutgers, State University of New Jersey, Piscataway, New Jersey, USA
| | - Konstantin Severinov
- Waksman Institute, Rutgers, State University of New Jersey, Piscataway, New Jersey, USA
- Institute of Molecular Genetics, Kurchatov National Research Center, Moscow, Russia
| | - Olga Soutourina
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
- Institut Universitaire de France (IUF), Paris, France
| |
Collapse
|
112
|
Lau AM, Kandathil SM, Jones DT. Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nat Commun 2023; 14:8445. [PMID: 38114456 PMCID: PMC10730818 DOI: 10.1038/s41467-023-43934-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open
Abstract
The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and fine-tuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.
Collapse
Affiliation(s)
- Andy M Lau
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - David T Jones
- Department of Computer Science, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
113
|
Cen LP, Ng TK, Ji J, Lin JW, Yao Y, Yang R, Dong G, Cao Y, Chen C, Yao SQ, Wang WY, Huang Z, Qiu K, Pang CP, Liu Q, Zhang M. Artificial Intelligence-based database for prediction of protein structure and their alterations in ocular diseases. Database (Oxford) 2023; 2023:baad083. [PMID: 38109881 PMCID: PMC10727695 DOI: 10.1093/database/baad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 07/17/2023] [Accepted: 12/15/2023] [Indexed: 12/20/2023]
Abstract
The aim of the study is to establish an online database for predicting protein structures altered in ocular diseases by Alphafold2 and RoseTTAFold algorithms. Totally, 726 genes of multiple ocular diseases were collected for protein structure prediction. Both Alphafold2 and RoseTTAFold algorithms were built locally using the open-source codebases. A dataset with 48 protein structures from Protein Data Bank (PDB) was adopted for algorithm set-up validation. A website was built to match ocular genes with the corresponding predicted tertiary protein structures for each amino acid sequence. The predicted local distance difference test-Cα (pLDDT) and template modeling (TM) scores of the validation protein structure and the selected ocular genes were evaluated. Molecular dynamics and molecular docking simulations were performed to demonstrate the applications of the predicted structures. For the validation dataset, 70.8% of the predicted protein structures showed pLDDT greater than 90. Compared to the PDB structures, 100% of the AlphaFold2-predicted structures and 97.9% of the RoseTTAFold-predicted structure showed TM score greater than 0.5. Totally, 1329 amino acid sequences of 430 ocular disease-related genes have been predicted, of which 75.9% showed pLDDT greater than 70 for the wildtype sequences and 76.1% for the variant sequences. Small molecule docking and molecular dynamics simulations revealed that the predicted protein structures with higher confidence scores showed similar molecular characteristics with the structures from PDB. We have developed an ocular protein structure database (EyeProdb) for ocular disease, which is released for the public and will facilitate the biological investigations and structure-based drug development for ocular diseases. Database URL: http://eyeprodb.jsiec.org.
Collapse
Affiliation(s)
| | - Tsz Kin Ng
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Jie Ji
- Network & Information Centre, Shantou University, 243 Daxue Road, Shantou, Guangdong 515063, China
| | - Jian-Wei Lin
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Yao Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Rucui Yang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Geng Dong
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
- Guangdong Provincial Key Laboratory of Infectious Diseases and Molecular Immunopathology, Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chongbo Chen
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Shi-Qi Yao
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Zijing Huang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Kunliang Qiu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| | - Chi Pui Pang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, 147K Argyle Street, KLN, Hong Kong
| | - Qingping Liu
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
- Shantou University Medical College, 22 Xinling Road, Shantou, Guangdong 515041, China
| | - Mingzhi Zhang
- Joint Shantou International Eye Centre of Shantou University and The Chinese University of Hong Kong, North Dongxia Road (Guangxia New Town), Shantou, Guangdong 515041, China
| |
Collapse
|
114
|
Knyazev DG, Winter L, Vogt A, Posch S, Öztürk Y, Siligan C, Goessweiner-Mohr N, Hagleitner-Ertugrul N, Koch HG, Pohl P. YidC from Escherichia coli Forms an Ion-Conducting Pore upon Activation by Ribosomes. Biomolecules 2023; 13:1774. [PMID: 38136645 PMCID: PMC10741985 DOI: 10.3390/biom13121774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/01/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open
Abstract
The universally conserved protein YidC aids in the insertion and folding of transmembrane polypeptides. Supposedly, a charged arginine faces its hydrophobic lipid core, facilitating polypeptide sliding along YidC's surface. How the membrane barrier to other molecules may be maintained is unclear. Here, we show that the purified and reconstituted E. coli YidC forms an ion-conducting transmembrane pore upon ribosome or ribosome-nascent chain complex (RNC) binding. In contrast to monomeric YidC structures, an AlphaFold parallel YidC dimer model harbors a pore. Experimental evidence for a dimeric assembly comes from our BN-PAGE analysis of native vesicles, fluorescence correlation spectroscopy studies, single-molecule fluorescence photobleaching observations, and crosslinking experiments. In the dimeric model, the conserved arginine and other residues interacting with nascent chains point into the putative pore. This result suggests the possibility of a YidC-assisted insertion mode alternative to the insertase mechanism.
Collapse
Affiliation(s)
- Denis G. Knyazev
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Lukas Winter
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Andreas Vogt
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert Ludwig University of Freiburg, 79104 Freiburg, Germany (Y.Ö.); (H.-G.K.)
- Spemann-Graduate School of Biology and Medicine (SGBM), Albert Ludwig University of Freiburg, 79104 Freiburg, Germany
- Faculty of Biology, Albert Ludwig University of Freiburg, 79104 Freiburg, Germany
| | - Sandra Posch
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Yavuz Öztürk
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert Ludwig University of Freiburg, 79104 Freiburg, Germany (Y.Ö.); (H.-G.K.)
| | - Christine Siligan
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Nikolaus Goessweiner-Mohr
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Nora Hagleitner-Ertugrul
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| | - Hans-Georg Koch
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert Ludwig University of Freiburg, 79104 Freiburg, Germany (Y.Ö.); (H.-G.K.)
- Spemann-Graduate School of Biology and Medicine (SGBM), Albert Ludwig University of Freiburg, 79104 Freiburg, Germany
| | - Peter Pohl
- Institute of Biophysics, Johannes Kepler University Linz, Gruberstrasse 40, A-4020 Linz, Austria; (D.G.K.); (L.W.); (S.P.); (C.S.); (N.G.-M.); (N.H.-E.)
| |
Collapse
|
115
|
Rehman S, Antonovic AK, McIntire IE, Zheng H, Cleaver L, Adams CO, Portlock T, Richardson K, Shaw R, Oregioni A, Mastroianni G, Whittaker SBM, Kelly G, Fornili A, Cianciotto NP, Garnett JA. The Legionella collagen-like protein employs a unique binding mechanism for the recognition of host glycosaminoglycans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.10.570962. [PMID: 38106198 PMCID: PMC10723406 DOI: 10.1101/2023.12.10.570962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Bacterial adhesion is a fundamental process which enables colonisation of niche environments and is key for infection. However, in Legionella pneumophila, the causative agent of Legionnaires' disease, these processes are not well understood. The Legionella collagen-like protein (Lcl) is an extracellular peripheral membrane protein that recognises sulphated glycosaminoglycans (GAGs) on the surface of eukaryotic cells, but also stimulates bacterial aggregation in response to divalent cations. Here we report the crystal structure of the Lcl C-terminal domain (Lcl-CTD) and present a model for intact Lcl. Our data reveal that Lcl-CTD forms an unusual dynamic trimer arrangement with a positively charged external surface and a negatively charged solvent exposed internal cavity. Through Molecular Dynamics (MD) simulations, we show how the GAG chondroitin-4-sulphate associates with the Lcl-CTD surface via unique binding modes. Our findings show that Lcl homologs are present across both the Pseudomonadota and Fibrobacterota-Chlorobiota-Bacteroidota phyla and suggest that Lcl may represent a versatile carbohydrate binding mechanism.
Collapse
Affiliation(s)
- Saima Rehman
- Centre for Host-Microbiome Interactions, Faculty of Dental, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Anna K. Antonovic
- School of Physical and Chemical Sciences, Queen Mary University of London, London, UK
| | - Ian E. McIntire
- Department of Microbiology and Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Huaixin Zheng
- Department of Microbiology and Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Leanne Cleaver
- Centre for Host-Microbiome Interactions, Faculty of Dental, Oral & Craniofacial Sciences, King’s College London, London, UK
| | - Carlton O. Adams
- Department of Microbiology and Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Theo Portlock
- Centre for Host-Microbiome Interactions, Faculty of Dental, Oral & Craniofacial Sciences, King’s College London, London, UK
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | - Katherine Richardson
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | - Rosie Shaw
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | - Alain Oregioni
- The Medical Research Council Biomedical NMR Centre, the Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Giulia Mastroianni
- School of Physical and Chemical Sciences, Queen Mary University of London, London, UK
| | - Sara B-M. Whittaker
- School of Cancer Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Geoff Kelly
- The Medical Research Council Biomedical NMR Centre, the Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Arianna Fornili
- School of Physical and Chemical Sciences, Queen Mary University of London, London, UK
| | - Nicholas P. Cianciotto
- Department of Microbiology and Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - James A. Garnett
- Centre for Host-Microbiome Interactions, Faculty of Dental, Oral & Craniofacial Sciences, King’s College London, London, UK
| |
Collapse
|
116
|
Côco LZ, Aires R, Carvalho GR, Belisário EDS, Yap MKK, Amorim FG, Conde-Aranda J, Nogueira BV, Vasquez EC, Pereira TDMC, Campagnaro BP. Unravelling the Gastroprotective Potential of Kefir: Exploring Antioxidant Effects in Preventing Gastric Ulcers. Cells 2023; 12:2799. [PMID: 38132119 PMCID: PMC10742242 DOI: 10.3390/cells12242799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/28/2023] [Accepted: 12/05/2023] [Indexed: 12/23/2023] Open
Abstract
The present study was conducted to evaluate the protective effect of milk kefir against NSAID-induced gastric ulcers. Male Swiss mice were divided into three groups: control (Vehicle; UHT milk at a dose of 0.3 mL/100 g), proton pump inhibitor (PPI; lansoprazole 30 mg/kg), and 4% milk kefir (Kefir; 0.3 mL/100 g). After 14 days of treatment, gastric ulcer was induced by oral administration of indomethacin (40 mg/kg). Reactive oxygen species (ROS), nitric oxide (NO), DNA content, cellular apoptosis, IL-10 and TNF-α levels, and myeloperoxidase (MPO) enzyme activity were determined. The interaction networks between NADPH oxidase 2 and kefir peptides 1-35 were determined using the Residue Interaction Network Generator (RING) webserver. Pretreatment with kefir for 14 days prevented gastric lesions. In addition, kefir administration reduced ROS production, DNA fragmentation, apoptosis, and TNF-α systemic levels. Simultaneously, kefir increased NO bioavailability in gastric cells and IL-10 systemic levels. A total of 35 kefir peptides showed affinity with NADPH oxidase 2. These findings suggest that the gastroprotective effect of kefir is due to its antioxidant and anti-inflammatory properties. Kefir could be a promising natural therapy for gastric ulcers, opening new perspectives for future research.
Collapse
Affiliation(s)
- Larissa Zambom Côco
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | - Rafaela Aires
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | - Glaucimeire Rocha Carvalho
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | - Eduarda de Souza Belisário
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | | | - Fernanda Gobbi Amorim
- Laboratory of Mass Spectrometry, Department of Chemistry, University of Liège, 4000 Liège, Belgium;
| | - Javier Conde-Aranda
- Molecular and Cellular Gastroenterology, Health Research Institute of Santiago de Compostela (IDIS), 15706 Santiago de Compostela, Spain;
| | - Breno Valentim Nogueira
- Department of Morphology, Health Sciences Center, Federal University of Espírito Santo (UFES), Vitoria 29047-105, ES, Brazil;
| | - Elisardo Corral Vasquez
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | - Thiago de Melo Costa Pereira
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| | - Bianca Prandi Campagnaro
- Laboratory of Translational Physiology and Pharmacology, Pharmaceutical Sciences Graduate Program, Vila Velha University (UVV), Vila Velha 29102-920, ES, Brazil; (L.Z.C.); (R.A.); (G.R.C.); (E.d.S.B.); (E.C.V.); (T.d.M.C.P.)
| |
Collapse
|
117
|
Notin P, Marks DS, Weitzman R, Gal Y. ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.06.570473. [PMID: 38106034 PMCID: PMC10723423 DOI: 10.1101/2023.12.06.570473] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric transformer variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust performance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments.
Collapse
Affiliation(s)
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| |
Collapse
|
118
|
Robin X, Studer G, Durairaj J, Eberhardt J, Schwede T, Walters WP. Assessment of protein-ligand complexes in CASP15. Proteins 2023; 91:1811-1821. [PMID: 37795762 DOI: 10.1002/prot.26601] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/14/2023] [Accepted: 09/19/2023] [Indexed: 10/06/2023]
Abstract
CASP15 introduced a new category, ligand prediction, where participants were provided with a protein or nucleic acid sequence, SMILES line notation, and stoichiometry for ligands and tasked with generating computational models for the three-dimensional structure of the corresponding protein-ligand complex. These models were subsequently compared with experimental structures determined by x-ray crystallography or cryoEM. To assess these predictions, two novel scores were developed. The Binding-Site Superposed, Symmetry-Corrected Pose Root Mean Square Deviation (BiSyRMSD) evaluated the absolute deviations of the models from the experimental structures. At the same time, the Local Distance Difference Test for Protein-Ligand Interactions (lDDT-PLI) assessed the ability of models to reproduce the protein-ligand interactions in the experimental structures. The ligands evaluated in this challenge range from single-atom ions to large flexible organic molecules. More than 1800 submissions were evaluated for their ability to predict 23 different protein-ligand complexes. Overall, the best models could faithfully reproduce the geometries of more than half of the prediction targets. The ligands' size and flexibility were the primary factors influencing the predictions' quality. Small ions and organic molecules with limited flexibility were predicted with high fidelity, while reproducing the binding poses of larger, flexible ligands proved more challenging.
Collapse
Affiliation(s)
- Xavier Robin
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jerome Eberhardt
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
119
|
Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins 2023; 91:1925-1934. [PMID: 37621223 DOI: 10.1002/prot.26582] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/22/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023]
Abstract
Protein intrinsic disorder (ID) is a complex and context-dependent phenomenon that covers a continuum between fully disordered states and folded states with long dynamic regions. The lack of a ground truth that fits all ID flavors and the potential for order-to-disorder transitions depending on specific conditions makes ID prediction challenging. The CAID2 challenge aimed to evaluate the performance of different prediction methods across different benchmarks, leveraging the annotation provided by the DisProt database, which stores the coordinates of ID regions when there is experimental evidence in the literature. The CAID2 challenge demonstrated varying performance of different prediction methods across different benchmarks, highlighting the need for continued development of more versatile and efficient prediction software. Depending on the application, researchers may need to balance performance with execution time when selecting a predictor. Methods based on AlphaFold2 seem to be good ID predictors but they are better at detecting absence of order rather than ID regions as defined in DisProt. The CAID2 predictors can be freely used through the CAID Prediction Portal, and CAID has been integrated into OpenEBench, which will become the official platform for running future CAID challenges.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Mahta Mehdiabadi
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Adel Bouhraoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
120
|
Fossa SL, Anton BP, Kneller DW, Petralia LMC, Ganatra MB, Boisvert ML, Vainauskas S, Chan SH, Hokke CH, Foster JM, Taron CH. A novel family of sugar-specific phosphodiesterases that remove zwitterionic modifications of GlcNAc. J Biol Chem 2023; 299:105437. [PMID: 37944617 PMCID: PMC10704324 DOI: 10.1016/j.jbc.2023.105437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
The zwitterions phosphorylcholine (PC) and phosphoethanolamine (PE) are often found esterified to certain sugars in polysaccharides and glycoconjugates in a wide range of biological species. One such modification involves PC attachment to the 6-carbon of N-acetylglucosamine (GlcNAc-6-PC) in N-glycans and glycosphingolipids (GSLs) of parasitic nematodes, a modification that helps the parasite evade host immunity. Knowledge of enzymes involved in the synthesis and degradation of PC and PE modifications is limited. More detailed studies on such enzymes would contribute to a better understanding of the function of PC modifications and have potential application in the structural analysis of zwitterion-modified glycans. In this study, we used functional metagenomic screening to identify phosphodiesterases encoded in a human fecal DNA fosmid library that remove PC from GlcNAc-6-PC. A novel bacterial phosphodiesterase was identified and biochemically characterized. This enzyme (termed GlcNAc-PDase) shows remarkable substrate preference for GlcNAc-6-PC and GlcNAc-6-PE, with little or no activity on other zwitterion-modified hexoses. The identified GlcNAc-PDase protein sequence is a member of the large endonuclease/exonuclease/phosphatase superfamily where it defines a distinct subfamily of related sequences of previously unknown function, mostly from Clostridium bacteria species. Finally, we demonstrate use of GlcNAc-PDase to confirm the presence of GlcNAc-6-PC in N-glycans and GSLs of the parasitic nematode Brugia malayi in a glycoanalytical workflow.
Collapse
Affiliation(s)
- Samantha L Fossa
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | - Brian P Anton
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | - Daniel W Kneller
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | - Laudine M C Petralia
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA; Department of Parasitology, Leiden University - Center of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Mehul B Ganatra
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | | | | | - Siu-Hong Chan
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | - Cornelis H Hokke
- Department of Parasitology, Leiden University - Center of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeremy M Foster
- Research Department, New England Biolabs, Ipswich, Massachusetts, USA
| | | |
Collapse
|
121
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
122
|
Ibrahim ZY, Uzairu A, Shallangwa GA, Abechi SE, Isyaku S. Homology modeling, docking, and ADMET studies of benzoheterocyclic 4-aminoquinolines analogs as inhibitors of Plasmodiumfalciparum. J Taibah Univ Med Sci 2023; 18:1200-1216. [PMID: 37250808 PMCID: PMC10209460 DOI: 10.1016/j.jtumed.2023.04.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 02/18/2023] [Accepted: 04/20/2023] [Indexed: 05/31/2023] Open
Abstract
Objectives The ongoing fight against endemic diseases is necessary due to the growing resistance of malarial parasites to widely accessible medications. Thus, there has been an ongoing search for antimalarial medications with improved efficacy. The goal of this study was to develop derivatives of benzoheterocyclic 4-aminoquinolines with enhanced activities and better binding affinities than the original compounds. Methods Thirty-four derivatives of benzoheterocyclic 4-aminoquinolines were docked (using a model of dihydrofolate reductase-thymidylate synthase [DRTS] protein) with Molegro software to identify the compound with the minimum docking score as a design template. The generated quantitative structure-activity model was employed to estimate the activity of the designed derivatives. The derivatives were also docked to determine the most stable derivatives. Furthermore, the designed derivatives were tested for their drug-likeness and pharmacokinetic properties using SwissADME software and pkCSM web application, respectively. Results Compound H-014, (N-(7-chloroquinolin-4-yl)-2-(4-methylpiperazin-1-yl)-1,3-benzoxazol-5-amine) with the lowest re-rank score of -115.423 was employed as the design template. Then 10 derivatives were further designed by substituting -OH, -OCH3, -CHO, -F, and -Cl groups at various positions of the template. We found that the designed derivatives had improved activities compared to the template. The docking scores of the designed derivatives were lower than those of the original derivatives. Derivative h-06 (7-methoxy-4-((2-(4-methylpiperazin-1-yl)benzo[d]oxazol-5-yl)amino)quinolin-6-ol) with four hydrogen bonds was identified as the most stable due to its lowest re-rank score (-163.607). While all of the designed derivatives satisfied both the Lipinski and Verber rules, some derivatives such as h-10 (cytochrome P450 1A2 [CYP1A2]); h-05, h-08, h-09, and h-10 [CYP2C19]; and h-03, h-07, h-08, and h-10 [renal organic cation transporter 2 substrate]) showed poor absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. Conclusion Ten derivatives of benzoheterocyclic 4-aminoquinolines were designed with improved efficacies. Derivatives that follow Lipinski and Verber rules and are mostly non-toxic and non-sensitive to the skin can be utilized in the development of effective antimalarial medications.
Collapse
Affiliation(s)
- Zakari Y. Ibrahim
- Corresponding address: Department of Chemistry, Faculty of Physical Sciences, Ahmadu Bello University, Zaria, Nigeria.
| | | | | | | | | |
Collapse
|
123
|
Lizana P, Godoy R, Martínez F, Wicher D, Kaltofen S, Guzmán L, Ramírez O, Cifuentes D, Mutis A, Venthur H. A highly conserved plant volatile odorant receptor detects a sex pheromone component of the greater wax moth, Galleria mellonella (Lepidoptera: Pyralidae). INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2023; 163:104031. [PMID: 37918449 DOI: 10.1016/j.ibmb.2023.104031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/26/2023] [Accepted: 10/28/2023] [Indexed: 11/04/2023]
Abstract
Odorant receptors (ORs) are key specialized units for mate and host finding in moths of the Ditrysia clade, to which 98% of the lepidopteran species belong. Moth ORs have evolved to respond to long unsaturated acetates, alcohols, or aldehydes (Type I sex pheromones), falling into conserved clades of pheromone receptors (PRs). These PRs might have evolved from old lineages of non-Ditrysian moths that use plant volatile-like pheromones. However, a Ditrysian moth called the greater wax moth, Galleria mellonella (a worldwide-distributed pest of beehives), uses C9-C11 saturated aldehydes as the main sex pheromone components (i.e., nonanal and undecanal). Thus, these aldehydes represent unusual components compared with the majority of moth species that use, for instance, Type I sex pheromones. Current evidence shows a lack of consensus in the amount of ORs for G. mellonella, although consistent in that the moth does not have conserved PRs. Using genomic data, 62 OR candidates were identified, 16 being new genes. Phylogeny showed no presence of ORs in conserved PR clades. However, an OR with the highest transcript abundance, GmelOR4, appeared in a conserved plant volatile-detecting clade. Functional findings from the HEK system showed the OR as sensitive to nonanal and 2-phenylacetaldehyde, but not to undecanal. It is believed that to date GmelOR4 represents the first, but likely not unique, OR with a stable function in detecting aldehydes that help maintain the life cycle of G. mellonella around honey bee colonies.
Collapse
Affiliation(s)
- Paula Lizana
- Programa de Doctorado en Ciencias de Recursos Naturales, Universidad de La Frontera, Temuco, Chile; Laboratorio de Química Ecológica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile
| | - Ricardo Godoy
- Laboratorio de Química Ecológica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile
| | - Francheska Martínez
- Laboratorio de Química Ecológica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile; Carrera de Bioquímica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile
| | - Dieter Wicher
- Max Planck Institute for Chemical Ecology, Department of Evolutionary Neuroethology, 07745, Jena, Germany
| | - Sabine Kaltofen
- Max Planck Institute for Chemical Ecology, Department of Evolutionary Neuroethology, 07745, Jena, Germany
| | - Leonardo Guzmán
- Departamento de Fisiología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción, Chile
| | - Oscar Ramírez
- Departamento de Fisiología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción, Chile
| | - Diego Cifuentes
- Departamento de Fisiología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción, Chile
| | - Ana Mutis
- Centro de Investigación Biotecnológica Aplicada al Medio Ambiente (CIBAMA), Universidad de La Frontera, Temuco, Chile; Laboratorio de Química Ecológica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile
| | - Herbert Venthur
- Centro de Investigación Biotecnológica Aplicada al Medio Ambiente (CIBAMA), Universidad de La Frontera, Temuco, Chile; Laboratorio de Química Ecológica, Departamento de Ciencias Químicas y Recursos Naturales, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco, Chile.
| |
Collapse
|
124
|
Mogila I, Tamulaitiene G, Keda K, Timinskas A, Ruksenaite A, Sasnauskas G, Venclovas Č, Siksnys V, Tamulaitis G. Ribosomal stalk-captured CARF-RelE ribonuclease inhibits translation following CRISPR signaling. Science 2023; 382:1036-1041. [PMID: 38033086 DOI: 10.1126/science.adj2107] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/31/2023] [Indexed: 12/02/2023]
Abstract
Prokaryotic type III CRISPR-Cas antiviral systems employ cyclic oligoadenylate (cAn) signaling to activate a diverse range of auxiliary proteins that reinforce the CRISPR-Cas defense. Here we characterize a class of cAn-dependent effector proteins named CRISPR-Cas-associated messenger RNA (mRNA) interferase 1 (Cami1) consisting of a CRISPR-associated Rossmann fold sensor domain fused to winged helix-turn-helix and a RelE-family mRNA interferase domain. Upon activation by cyclic tetra-adenylate (cA4), Cami1 cleaves mRNA exposed at the ribosomal A-site thereby depleting mRNA and leading to cell growth arrest. The structures of apo-Cami1 and the ribosome-bound Cami1-cA4 complex delineate the conformational changes that lead to Cami1 activation and the mechanism of Cami1 binding to a bacterial ribosome, revealing unexpected parallels with eukaryotic ribosome-inactivating proteins.
Collapse
Affiliation(s)
- Irmantas Mogila
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Giedre Tamulaitiene
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Konstanty Keda
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Albertas Timinskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Audrone Ruksenaite
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Giedrius Sasnauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Virginijus Siksnys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| | - Gintautas Tamulaitis
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania
| |
Collapse
|
125
|
Xia Y, Zhao K, Liu D, Zhou X, Zhang G. Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning. Commun Biol 2023; 6:1221. [PMID: 38040847 PMCID: PMC10692239 DOI: 10.1038/s42003-023-05610-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open
Abstract
Accurately capturing domain-domain interactions is key to understanding protein function and designing structure-based drugs. Although AlphaFold2 has made a breakthrough on single domain, it should be noted that the structure modeling for multi-domain protein and complex remains a challenge. In this study, we developed a multi-domain and complex structure assembly protocol, named DeepAssembly, based on domain segmentation and single domain modeling algorithms. Firstly, DeepAssembly uses a population-based evolutionary algorithm to assemble multi-domain proteins by inter-domain interactions inferred from a developed deep learning network. Secondly, protein complexes are assembled by means of domains rather than chains using DeepAssembly. Experimental results show that on 219 multi-domain proteins, the average inter-domain distance precision by DeepAssembly is 22.7% higher than that of AlphaFold2. Moreover, DeepAssembly improves accuracy by 13.1% for 164 multi-domain structures with low confidence deposited in AlphaFold database. We apply DeepAssembly for the prediction of 247 heterodimers. We find that DeepAssembly successfully predicts the interface (DockQ ≥ 0.23) for 32.4% of the dimers, suggesting a lighter way to assemble complex structures by treating domains as assembly units and using inter-domain interactions learned from monomer structures.
Collapse
Affiliation(s)
- Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
126
|
Kryshtafovych A, Rigden DJ. To split or not to split: CASP15 targets and their processing into tertiary structure evaluation units. Proteins 2023; 91:1558-1570. [PMID: 37254889 PMCID: PMC10687315 DOI: 10.1002/prot.26533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/02/2023] [Accepted: 05/18/2023] [Indexed: 06/01/2023]
Abstract
Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors' performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.
Collapse
Affiliation(s)
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
127
|
Dougherty PE, Nielsen TK, Riber L, Lading HH, Forero-Junco LM, Kot W, Raaijmakers JM, Hansen LH. Widespread and largely unknown prophage activity, diversity, and function in two genera of wheat phyllosphere bacteria. THE ISME JOURNAL 2023; 17:2415-2425. [PMID: 37919394 PMCID: PMC10689766 DOI: 10.1038/s41396-023-01547-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 10/11/2023] [Accepted: 10/16/2023] [Indexed: 11/04/2023]
Abstract
Environmental bacteria host an enormous number of prophages, but their diversity and natural functions remain largely elusive. Here, we investigate prophage activity and diversity in 63 Erwinia and Pseudomonas strains isolated from flag leaves of wheat grown in a single field. Introducing and validating Virion Induction Profiling Sequencing (VIP-Seq), we identify and quantify the activity of 120 spontaneously induced prophages, discovering that some phyllosphere bacteria produce more than 108 virions/mL in overnight cultures, with significant induction also observed in planta. Sequence analyses and plaque assays reveal E. aphidicola prophages contribute a majority of intraspecies genetic diversity and divide their bacterial hosts into antagonistic factions engaged in widespread microbial warfare, revealing the importance of prophage-mediated microdiversity. When comparing spontaneously active prophages with predicted prophages we also find insertion sequences are strongly correlated with non-active prophages. In conclusion, we discover widespread and largely unknown prophage diversity and function in phyllosphere bacteria.
Collapse
Affiliation(s)
- Peter Erdmann Dougherty
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Tue Kjærgaard Nielsen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Leise Riber
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Helen Helgå Lading
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | - Witold Kot
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jos M Raaijmakers
- Department of Microbial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark.
| |
Collapse
|
128
|
Prokopchuk G, Butenko A, Dacks JB, Speijer D, Field MC, Lukeš J. Lessons from the deep: mechanisms behind diversification of eukaryotic protein complexes. Biol Rev Camb Philos Soc 2023; 98:1910-1927. [PMID: 37336550 PMCID: PMC10952624 DOI: 10.1111/brv.12988] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 06/21/2023]
Abstract
Genetic variation is the major mechanism behind adaptation and evolutionary change. As most proteins operate through interactions with other proteins, changes in protein complex composition and subunit sequence provide potentially new functions. Comparative genomics can reveal expansions, losses and sequence divergence within protein-coding genes, but in silico analysis cannot detect subunit substitutions or replacements of entire protein complexes. Insights into these fundamental evolutionary processes require broad and extensive comparative analyses, from both in silico and experimental evidence. Here, we combine data from both approaches and consider the gamut of possible protein complex compositional changes that arise during evolution, citing examples of complete conservation to partial and total replacement by functional analogues. We focus in part on complexes in trypanosomes as they represent one of the better studied non-animal/non-fungal lineages, but extend insights across the eukaryotes by extensive comparative genomic analysis. We argue that gene loss plays an important role in diversification of protein complexes and hence enhancement of eukaryotic diversity.
Collapse
Affiliation(s)
- Galina Prokopchuk
- Institute of Parasitology, Biology Centre, Czech Academy of SciencesBranišovská 1160/31České Budějovice37005Czech Republic
- Faculty of ScienceUniversity of South BohemiaBranišovská 1160/31České Budějovice37005Czech Republic
| | - Anzhelika Butenko
- Institute of Parasitology, Biology Centre, Czech Academy of SciencesBranišovská 1160/31České Budějovice37005Czech Republic
- Faculty of ScienceUniversity of South BohemiaBranišovská 1160/31České Budějovice37005Czech Republic
- Life Science Research Centre, Faculty of ScienceUniversity of OstravaChittussiho 983/10Ostrava71000Czech Republic
| | - Joel B. Dacks
- Institute of Parasitology, Biology Centre, Czech Academy of SciencesBranišovská 1160/31České Budějovice37005Czech Republic
- Division of Infectious Diseases, Department of MedicineUniversity of Alberta1‐124 Clinical Sciences Building, 11350‐83 AvenueEdmontonT6G 2R3AlbertaCanada
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and the EnvironmentUniversity College LondonDarwin Building, Gower StreetLondonWC1E 6BTUK
| | - Dave Speijer
- Medical Biochemistry, Amsterdam UMCUniversity of AmsterdamMeibergdreef 15Amsterdam1105 AZThe Netherlands
| | - Mark C. Field
- Institute of Parasitology, Biology Centre, Czech Academy of SciencesBranišovská 1160/31České Budějovice37005Czech Republic
- School of Life SciencesUniversity of DundeeDow StreetDundeeDD1 5EHScotlandUK
| | - Julius Lukeš
- Institute of Parasitology, Biology Centre, Czech Academy of SciencesBranišovská 1160/31České Budějovice37005Czech Republic
- Faculty of ScienceUniversity of South BohemiaBranišovská 1160/31České Budějovice37005Czech Republic
| |
Collapse
|
129
|
Altae-Tran H, Shmakov SA, Makarova KS, Wolf YI, Kannan S, Zhang F, Koonin EV. Diversity, evolution, and classification of the RNA-guided nucleases TnpB and Cas12. Proc Natl Acad Sci U S A 2023; 120:e2308224120. [PMID: 37983496 PMCID: PMC10691335 DOI: 10.1073/pnas.2308224120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/19/2023] [Indexed: 11/22/2023] Open
Abstract
The TnpB proteins are transposon-associated RNA-guided nucleases that are among the most abundant proteins encoded in bacterial and archaeal genomes, but whose functions in the transposon life cycle remain unknown. TnpB appears to be the evolutionary ancestor of Cas12, the effector nuclease of type V CRISPR-Cas systems. We performed a comprehensive census of TnpBs in archaeal and bacterial genomes and constructed a phylogenetic tree on which we mapped various features of these proteins. In multiple branches of the tree, the catalytic site of the TnpB nuclease is rearranged, demonstrating structural and probably biochemical malleability of this enzyme. We identified numerous cases of apparent recruitment of TnpB for other functions of which the most common is the evolution of type V CRISPR-Cas effectors on about 50 independent occasions. In many other cases of more radical exaptation, the catalytic site of the TnpB nuclease is apparently inactivated, suggesting a regulatory function, whereas in others, the activity appears to be retained, indicating that the recruited TnpB functions as a nuclease, for example, as a toxin. These findings demonstrate remarkable evolutionary malleability of the TnpB scaffold and provide extensive opportunities for further exploration of RNA-guided biological systems as well as multiple applications.
Collapse
Affiliation(s)
- Han Altae-Tran
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Sergey A. Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Soumya Kannan
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Feng Zhang
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| |
Collapse
|
130
|
Smug BJ, Szczepaniak K, Rocha EPC, Dunin-Horkawicz S, Mostowy RJ. Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts. Nat Commun 2023; 14:7460. [PMID: 38016962 PMCID: PMC10684548 DOI: 10.1038/s41467-023-43236-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/03/2023] [Indexed: 11/30/2023] Open
Abstract
Biological modularity enhances evolutionary adaptability. This principle is vividly exemplified by bacterial viruses (phages), which display extensive genomic modularity. Phage genomes are composed of independent functional modules that evolve separately and recombine in various configurations. While genomic modularity in phages has been extensively studied, less attention has been paid to protein modularity-proteins consisting of distinct building blocks that can evolve and recombine, enhancing functional and genetic diversity. Here, we use a set of 133,574 representative phage proteins and highly sensitive homology detection to capture instances of domain mosaicism, defined as fragment sharing between two otherwise unrelated proteins, and to understand its relationship with functional diversity in phage genomes. We discover that unrelated proteins from diverse functional classes frequently share homologous domains. This phenomenon is particularly pronounced within receptor-binding proteins, endolysins, and DNA polymerases. We also identify multiple instances of recent diversification via domain shuffling in receptor-binding proteins, neck passage structures, endolysins and some members of the core replication machinery, often transcending distant taxonomic and ecological boundaries. Our findings suggest that ongoing diversification via domain shuffling is reflective of a co-evolutionary arms race, driven by the need to overcome various bacterial resistance mechanisms against phages.
Collapse
Affiliation(s)
- Bogna J Smug
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | | | - Eduardo P C Rocha
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, Paris, France
| | - Stanislaw Dunin-Horkawicz
- Institute of Evolutionary Biology, Faculty of Biology & Biological and Chemical Research Centre, University of Warsaw, Żwirki i Wigury 101, 02-089, Warsaw, Poland
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Rafał J Mostowy
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
| |
Collapse
|
131
|
Alvarez-Carreño C, Arciniega M, Ribas de Pouplana L, Petrov AS, Hernández-González A, Dimas-Torres JU, Valencia-Sánchez MI, Williams LD, Torres-Larios A. Common evolutionary origins of the bacterial glycyl tRNA synthetase and alanyl tRNA synthetase. Protein Sci 2023; 33:e4844. [PMID: 38009704 PMCID: PMC10895455 DOI: 10.1002/pro.4844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/07/2023] [Accepted: 11/18/2023] [Indexed: 11/29/2023]
Abstract
Aminoacyl-tRNA synthetases (aaRSs) establish the genetic code. Each aaRS covalently links a given canonical amino acid to a cognate set of tRNA isoacceptors. Glycyl tRNA aminoacylation is unusual in that it is catalyzed by different aaRSs in different lineages of the Tree of Life. We have investigated the phylogenetic distribution and evolutionary history of bacterial glycyl tRNA synthetase (bacGlyRS). This enzyme is found in early diverging bacterial phyla such as Firmicutes, Acidobacteria, and Proteobacteria, but not in archaea or eukarya. We observe relationships between each of six domains of bacGlyRS and six domains of four different RNA-modifying proteins. Component domains of bacGlyRS show common ancestry with (i) the catalytic domain of class II tRNA synthetases; (ii) the HD domain of the bacterial RNase Y; (iii) the body and tail domains of the archaeal CCA-adding enzyme; (iv) the anti-codon binding domain of the arginyl tRNA synthetase; and (v) a previously unrecognized domain that we call ATL (Ancient tRNA latch). The ATL domain has been found thus far only in bacGlyRS and in the universal alanyl tRNA synthetase (uniAlaRS). Further, the catalytic domain of bacGlyRS is more closely related to the catalytic domain of uniAlaRS than to any other aminoacyl tRNA synthetase. The combined results suggest that the ATL and catalytic domains of these two enzymes are ancestral to bacGlyRS and uniAlaRS, which emerged from common protein ancestors by bricolage, stepwise accumulation of protein domains, before the last universal common ancestor of life.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marcelino Arciniega
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia, Spain
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Adriana Hernández-González
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jorge-Uriel Dimas-Torres
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marco Igor Valencia-Sánchez
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Alfredo Torres-Larios
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
132
|
Hamamsy T, Barot M, Morton JT, Steinegger M, Bonneau R, Cho K. Learning sequence, structure, and function representations of proteins with language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568742. [PMID: 38045331 PMCID: PMC10690258 DOI: 10.1101/2023.11.26.568742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.
Collapse
|
133
|
Grasekamp KP, Beaud Benyahia B, Taib N, Audrain B, Bardiaux B, Rossez Y, Izadi-Pruneyre N, Lejeune M, Trivelli X, Chouit Z, Guerardel Y, Ghigo JM, Gribaldo S, Beloin C. The Mla system of diderm Firmicute Veillonella parvula reveals an ancestral transenvelope bridge for phospholipid trafficking. Nat Commun 2023; 14:7642. [PMID: 37993432 PMCID: PMC10665443 DOI: 10.1038/s41467-023-43411-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/08/2023] [Indexed: 11/24/2023] Open
Abstract
E. coli and most other diderm bacteria (those with two membranes) have an inner membrane enriched in glycerophospholipids (GPLs) and an asymmetric outer membrane (OM) containing GPLs in its inner leaflet and primarily lipopolysaccharides in its outer leaflet. In E. coli, this lipid asymmetry is maintained by the Mla system which consists of six proteins: the OM lipoprotein MlaA extracts GPLs from the outer leaflet, and the periplasmic chaperone MlaC transfers them across the periplasm to the inner membrane complex MlaBDEF. However, GPL trafficking still remains poorly understood, and has only been studied in a handful of model species. Here, we investigate GPL trafficking in Veillonella parvula, a diderm Firmicute with an Mla system that lacks MlaA and MlaC, but contains an elongated MlaD. V. parvula mla mutants display phenotypes characteristic of disrupted lipid asymmetry which can be suppressed by mutations in tamB, supporting that these two systems have opposite GPL trafficking functions across diverse bacterial lineages. Structural modelling and subcellular localisation assays suggest that V. parvula MlaD forms a transenvelope bridge, comprising a typical inner membrane-localised MCE domain and, in addition, an outer membrane ß-barrel. Phylogenomic analyses indicate that this elongated MlaD type is widely distributed across diderm bacteria and likely forms part of the ancestral functional core of the Mla system, which would be composed of MlaEFD only.
Collapse
Affiliation(s)
- Kyrie P Grasekamp
- Institut Pasteur, Université Paris Cité, Genetics of Biofilms Laboratory, Paris, France
| | - Basile Beaud Benyahia
- Institut Pasteur, Université Paris Cité, Evolutionary Biology of the Microbial Cell Laboratory, Paris, France
| | - Najwa Taib
- Institut Pasteur, Université Paris Cité, Evolutionary Biology of the Microbial Cell Laboratory, Paris, France
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France
| | - Bianca Audrain
- Institut Pasteur, Université Paris Cité, Genetics of Biofilms Laboratory, Paris, France
| | - Benjamin Bardiaux
- Institut Pasteur, Université Paris Cité, Structural Bioinformatics Unit, CNRS UMR 3528, Paris, France
- Institut Pasteur, Université Paris Cité, Bacterial Transmembrane Systems Unit, CNRS UMR 3528, Paris, France
| | - Yannick Rossez
- Université de Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Nadia Izadi-Pruneyre
- Institut Pasteur, Université Paris Cité, Structural Bioinformatics Unit, CNRS UMR 3528, Paris, France
- Institut Pasteur, Université Paris Cité, Bacterial Transmembrane Systems Unit, CNRS UMR 3528, Paris, France
| | - Maylis Lejeune
- Institut Pasteur, Université Paris Cité, Structural Bioinformatics Unit, CNRS UMR 3528, Paris, France
- Institut Pasteur, Université Paris Cité, Bacterial Transmembrane Systems Unit, CNRS UMR 3528, Paris, France
| | - Xavier Trivelli
- Université de Lille, CNRS, INRAE, Centrale Lille, Université d'Artois, FR 2638 - IMEC - Institut Michel-Eugène Chevreul, Lille, 59000, France
| | - Zina Chouit
- Université de Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Yann Guerardel
- Université de Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
- Institute for Glyco-core Research (iGCORE), Gifu University, Gifu, Japan
| | - Jean-Marc Ghigo
- Institut Pasteur, Université Paris Cité, Genetics of Biofilms Laboratory, Paris, France
| | - Simonetta Gribaldo
- Institut Pasteur, Université Paris Cité, Evolutionary Biology of the Microbial Cell Laboratory, Paris, France.
| | - Christophe Beloin
- Institut Pasteur, Université Paris Cité, Genetics of Biofilms Laboratory, Paris, France.
| |
Collapse
|
134
|
Cao W, Wu LY, Xia XY, Chen X, Wang ZX, Pan XM. A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Sci Rep 2023; 13:20304. [PMID: 37985846 PMCID: PMC10662474 DOI: 10.1038/s41598-023-47496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open
Abstract
Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Collapse
Affiliation(s)
- Wei Cao
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Lu-Yun Wu
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xia-Yu Xia
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xiang Chen
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi-Xin Wang
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| | - Xian-Ming Pan
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
135
|
Girasol MJ, Briggs EM, Marques CA, Batista JM, Beraldi D, Burchmore R, Lemgruber L, McCulloch R. Immunoprecipitation of RNA-DNA hybrid interacting proteins in Trypanosoma brucei reveals conserved and novel activities, including in the control of surface antigen expression needed for immune evasion by antigenic variation. Nucleic Acids Res 2023; 51:11123-11141. [PMID: 37843098 PMCID: PMC10639054 DOI: 10.1093/nar/gkad836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 09/14/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
RNA-DNA hybrids are epigenetic features of genomes that provide a diverse and growing range of activities. Understanding of these functions has been informed by characterising the proteins that interact with the hybrids, but all such analyses have so far focused on mammals, meaning it is unclear if a similar spectrum of RNA-DNA hybrid interactors is found in other eukaryotes. The African trypanosome is a single-cell eukaryotic parasite of the Discoba grouping and displays substantial divergence in several aspects of core biology from its mammalian host. Here, we show that DNA-RNA hybrid immunoprecipitation coupled with mass spectrometry recovers 602 putative interactors in T. brucei mammal- and insect-infective cells, some providing activities also found in mammals and some lineage-specific. We demonstrate that loss of three factors, two putative helicases and a RAD51 paralogue, alters T. brucei nuclear RNA-DNA hybrid and DNA damage levels. Moreover, loss of each factor affects the operation of the parasite immune survival mechanism of antigenic variation. Thus, our work reveals the broad range of activities contributed by RNA-DNA hybrids to T. brucei biology, including new functions in host immune evasion as well as activities likely fundamental to eukaryotic genome function.
Collapse
Affiliation(s)
- Mark J Girasol
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
- University of the Philippines Manila, College of Medicine, Manila, Philippines
| | - Emma M Briggs
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
- University of Edinburgh, Institute for Immunology and Infection Research, School of Biological Sciences, Edinburgh, UK
| | - Catarina A Marques
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| | - José M Batista
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| | - Dario Beraldi
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| | - Richard Burchmore
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| | - Leandro Lemgruber
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| | - Richard McCulloch
- University of Glasgow, College of Medical, Veterinary and Life Sciences, School of Infection and Immunity, Wellcome Centre for Integrative Parasitology, Glasgow, UK
| |
Collapse
|
136
|
Balupuri A, Kim JM, Choi KE, No JS, Kim IH, Rhee JE, Kim EJ, Kang NS. Comparative Computational Analysis of Spike Protein Structural Stability in SARS-CoV-2 Omicron Subvariants. Int J Mol Sci 2023; 24:16069. [PMID: 38003257 PMCID: PMC10671153 DOI: 10.3390/ijms242216069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/01/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
The continuous emergence of new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants with multiple spike (S) protein mutations pose serious threats to current coronavirus disease 2019 (COVID-19) therapies. A comprehensive understanding of the structural stability of SARS-CoV-2 variants is vital for the development of effective therapeutic strategies as it can offer valuable insights into their potential impact on viral infectivity. S protein mediates a virus' attachment to host cells by binding to angiotensin-converting enzyme 2 (ACE2) through its receptor-binding domain (RBD), and mutations in this protein can affect its stability and binding affinity. We analyzed S protein structural stability in various Omicron subvariants computationally. Notably, the S protein sequences analyzed in this work were obtained directly from our own sample collection. We evaluated the binding free energy between S protein and ACE2 in several complex forms. Additionally, we measured distances between the RBD of each chain in S protein to analyze conformational changes. Unlike most of the prior studies, we analyzed full-length S protein-ACE2 complexes instead of only RBD-ACE2 complexes. Omicron subvariants including BA.1, BA.2, BA.2.12.1, BA.4/BA.5, BA.2.75, BA.2.75_K147E, BA.4.6 and BA.4.6_N658S showed enhanced stability compared to wild type, potentially due to distinct S protein mutations. Among them, BA.2.75 and BA.4.6_N658S exhibited the highest and lowest level of stability, respectively.
Collapse
Affiliation(s)
- Anand Balupuri
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (A.B.); (K.-E.C.)
| | - Jeong-Min Kim
- Division of Emerging Infectious Diseases, Bureau of Infectious Disease Diagnosis Control, Korea Disease, Control and Prevention Agency, 187 Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si 28159, Republic of Korea; (J.-M.K.); (J.S.N.); (I.-H.K.); (J.E.R.)
| | - Kwang-Eun Choi
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (A.B.); (K.-E.C.)
| | - Jin Sun No
- Division of Emerging Infectious Diseases, Bureau of Infectious Disease Diagnosis Control, Korea Disease, Control and Prevention Agency, 187 Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si 28159, Republic of Korea; (J.-M.K.); (J.S.N.); (I.-H.K.); (J.E.R.)
| | - Il-Hwan Kim
- Division of Emerging Infectious Diseases, Bureau of Infectious Disease Diagnosis Control, Korea Disease, Control and Prevention Agency, 187 Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si 28159, Republic of Korea; (J.-M.K.); (J.S.N.); (I.-H.K.); (J.E.R.)
| | - Jee Eun Rhee
- Division of Emerging Infectious Diseases, Bureau of Infectious Disease Diagnosis Control, Korea Disease, Control and Prevention Agency, 187 Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si 28159, Republic of Korea; (J.-M.K.); (J.S.N.); (I.-H.K.); (J.E.R.)
| | - Eun-Jin Kim
- Division of Emerging Infectious Diseases, Bureau of Infectious Disease Diagnosis Control, Korea Disease, Control and Prevention Agency, 187 Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si 28159, Republic of Korea; (J.-M.K.); (J.S.N.); (I.-H.K.); (J.E.R.)
| | - Nam Sook Kang
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (A.B.); (K.-E.C.)
| |
Collapse
|
137
|
Evseev P, Bocharova J, Shagin D, Chebotar I. Analysis of Pseudomonas aeruginosa Isolates from Patients with Cystic Fibrosis Revealed Novel Groups of Filamentous Bacteriophages. Viruses 2023; 15:2215. [PMID: 38005892 PMCID: PMC10675462 DOI: 10.3390/v15112215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/02/2023] [Accepted: 11/03/2023] [Indexed: 11/26/2023] Open
Abstract
Pseudomonas aeruginosa is an opportunistic pathogen that can cause infections in humans, especially in hospital patients with compromised host defence mechanisms, including patients with cystic fibrosis. Filamentous bacteriophages represent a group of single-stranded DNA viruses infecting different bacteria, including P. aeruginosa and other human and animal pathogens; many of them can replicate when integrated into the bacterial chromosome. Filamentous bacteriophages can contribute to the virulence of P. aeruginosa and influence the course of the disease. There are just a few isolated and officially classified filamentous bacteriophages infecting P. aeruginosa, but genomic studies indicated the frequent occurrence of integrated prophages in many P. aeruginosa genomes. An analysis of sequenced genomes of P. aeruginosa isolated from upper respiratory tract (throat and nasal swabs) and sputum specimens collected from Russian patients with cystic fibrosis indicated a higher diversity of filamentous bacteriophages than first thought. A detailed analysis of predicted bacterial proteins revealed prophage regions representing the filamentous phages known to be quite distantly related to known phages. Genomic comparisons and phylogenetic studies enabled the proposal of several new taxonomic groups of filamentous bacteriophages.
Collapse
Affiliation(s)
- Peter Evseev
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia; (J.B.); (D.S.)
- Laboratory of Molecular Bioengineering, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya 16/10, 117997 Moscow, Russia
| | - Julia Bocharova
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia; (J.B.); (D.S.)
| | - Dmitriy Shagin
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia; (J.B.); (D.S.)
| | - Igor Chebotar
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia; (J.B.); (D.S.)
| |
Collapse
|
138
|
Li M, Wang H, Yang Z, Zhang L, Zhu Y. DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences. Comput Struct Biotechnol J 2023; 21:5544-5560. [PMID: 38034401 PMCID: PMC10681957 DOI: 10.1016/j.csbj.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/02/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (Tm). However, due to the limited availability of experimentally determined Tm data and the insufficient accuracy of existing computational methods in predicting Tm, there is an urgent need for a computational approach to accurately forecast the Tm values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the Tm values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R2) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the Tm values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering.
Collapse
Affiliation(s)
- Mengyu Li
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Hongzhao Wang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Zhenwu Yang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Longgui Zhang
- SINOPEC Beijing Research Institute of Chemical Industry, Beijing 100013, China
| | - Yushan Zhu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
139
|
Gil Zuluaga FH, D’Arminio N, Bardozzo F, Tagliaferri R, Marabotti A. An automated pipeline integrating AlphaFold 2 and MODELLER for protein structure prediction. Comput Struct Biotechnol J 2023; 21:5620-5629. [PMID: 38047234 PMCID: PMC10690423 DOI: 10.1016/j.csbj.2023.10.056] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/31/2023] [Accepted: 10/31/2023] [Indexed: 12/05/2023] Open
Abstract
The ability to predict a protein's three-dimensional conformation represents a crucial starting point for investigating evolutionary connections with other members of the corresponding protein family, examining interactions with other proteins, and potentially utilizing this knowledge for the purpose of rational drug design. In this work, we evaluated the feasibility of improving AlphaFold2's three-dimensional protein predictions by developing a novel pipeline (AlphaMod) that incorporates AlphaFold2 with MODELLER, a template-based modeling program. Additionally, our tool can drive a comprehensive quality assessment of the tertiary protein structure by incorporating and comparing a set of different quality assessment tools. The outcomes of selected tools are combined into a composite score (BORDASCORE) that exhibits a meaningful correlation with GDT_TS and facilitates the selection of optimal models in the absence of a reference structure. To validate AlphaMod's results, we conducted evaluations using two distinct datasets summing up to 72 targets, previously used to independently assess AlphaFold2's performance. The generated models underwent evaluation through two methods: i) averaging the GDT_TS scores across all produced structures for a single target sequence, and ii) a pairwise comparison of the best structures generated by AlphaFold2 and AlphaMod. The latter, within the unsupervised setups, shows a rising accuracy of approximately 34% over AlphaFold2. While, when considering the supervised setup, AlphaMod surpasses AlphaFold2 in 18% of the instances. Finally, there is an 11% correspondence in outcomes between the diverse methodologies. Consequently, AlphaMod's best-predicted tertiary structures in several cases exhibited a significant improvement in the accuracy of the predictions with respect to the best models obtained by AlphaFold2. This pipeline paves the way for the integration of additional data and AI-based algorithms to further improve the reliability of the predictions.
Collapse
Affiliation(s)
- Fabio Hernan Gil Zuluaga
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Nancy D’Arminio
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Francesco Bardozzo
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Roberto Tagliaferri
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| |
Collapse
|
140
|
Mansoor S, Baek M, Juergens D, Watson JL, Baker D. Zero-shot mutation effect prediction on protein stability and function using RoseTTAFold. Protein Sci 2023; 32:e4780. [PMID: 37695922 PMCID: PMC10578109 DOI: 10.1002/pro.4780] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/13/2023]
Abstract
Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly trained for sequence and structure recovery, RFjoint , for mutation effect prediction. Without any further training, we achieve comparable accuracy in predicting mutation effects for a diverse set of protein families using RFjoint to both another zero-shot model (MSA Transformer) and a model that requires specific training on a particular protein family for mutation effect prediction (DeepSequence). Thus, although the architecture of RFjoint was developed to address the protein design problem of scaffolding functional motifs, RFjoint acquired an understanding of the mutational landscapes of proteins during model training that is equivalent to that of recently developed large protein language models. The ability to simultaneously reason over protein structure and sequence could enable even more precise mutation effect predictions following supervised training on the task. These results suggest that RFjoint has a quite broad understanding of protein sequence-structure landscapes, and can be viewed as a joint model for protein sequence and structure which could be broadly useful for protein modeling.
Collapse
Affiliation(s)
- Sanaa Mansoor
- Department of BiochemistryUniversity of WashingtonSeattleWashington, WAUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWashington, WAUSA
- Molecular Engineering Graduate ProgramUniversity of WashingtonSeattleWashington, WAUSA
| | - Minkyung Baek
- Department of BiochemistryUniversity of WashingtonSeattleWashington, WAUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWashington, WAUSA
- School of Biological SciencesSeoul National UniversitySeoulRepublic of Korea
| | - David Juergens
- Department of BiochemistryUniversity of WashingtonSeattleWashington, WAUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWashington, WAUSA
- Molecular Engineering Graduate ProgramUniversity of WashingtonSeattleWashington, WAUSA
| | - Joseph L. Watson
- Department of BiochemistryUniversity of WashingtonSeattleWashington, WAUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWashington, WAUSA
| | - David Baker
- Department of BiochemistryUniversity of WashingtonSeattleWashington, WAUSA
- Institute for Protein DesignUniversity of WashingtonSeattleWashington, WAUSA
- Howard Hughes Medical InstituteUniversity of WashingtonSeattleWashington, WAUSA
| |
Collapse
|
141
|
Medvedeva S, Borrel G, Krupovic M, Gribaldo S. A compendium of viruses from methanogenic archaea reveals their diversity and adaptations to the gut environment. Nat Microbiol 2023; 8:2170-2182. [PMID: 37749252 DOI: 10.1038/s41564-023-01485-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/30/2023] [Indexed: 09/27/2023]
Abstract
Methanogenic archaea are major producers of methane, a potent greenhouse gas and biofuel, and are widespread in diverse environments, including the animal gut. The ecophysiology of methanogens is likely impacted by viruses, which remain, however, largely uncharacterized. Here we carried out a global investigation of viruses associated with all current diversity of methanogens by assembling an extensive CRISPR database consisting of 156,000 spacers. We report 282 high-quality (pro)viral and 205 virus-like/plasmid sequences assigned to hosts belonging to ten main orders of methanogenic archaea. Viruses of methanogens can be classified into 87 families, underscoring a still largely undiscovered genetic diversity. Viruses infecting gut-associated archaea provide evidence of convergence in adaptation with viruses infecting gut-associated bacteria. These viruses contain a large repertoire of lysin proteins that cleave archaeal pseudomurein and are enriched in glycan-binding domains (Ig-like/Flg_new) and diversity-generating retroelements. The characterization of this vast repertoire of viruses paves the way towards a better understanding of their role in regulating methanogen communities globally, as well as the development of much-needed genetic tools.
Collapse
Affiliation(s)
- Sofia Medvedeva
- Institut Pasteur, Université Paris Cité, Unit Evolutionary Biology of the Microbial Cell, Paris, France
| | - Guillaume Borrel
- Institut Pasteur, Université Paris Cité, Unit Evolutionary Biology of the Microbial Cell, Paris, France.
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, Unit Archaeal Virology, Paris, France.
| | - Simonetta Gribaldo
- Institut Pasteur, Université Paris Cité, Unit Evolutionary Biology of the Microbial Cell, Paris, France.
| |
Collapse
|
142
|
Abad L, Gauthier CH, Florian I, Jacobs-Sera D, Hatfull GF. The heterogenous and diverse population of prophages in Mycobacterium genomes. mSystems 2023; 8:e0044623. [PMID: 37791767 PMCID: PMC10654092 DOI: 10.1128/msystems.00446-23] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/18/2023] [Indexed: 10/05/2023] Open
Abstract
IMPORTANCE Mycobacterium species include several human pathogens and mycobacteriophages show potential for therapeutic use to control Mycobacterium infections. However, phage infection profiles vary greatly among Mycobacterium abscessus clinical isolates and phage therapies must be personalized for individual patients. Mycobacterium phage susceptibility is likely determined primarily by accessory parts of bacterial genomes, and we have identified the prophage and phage-related genomic regions across sequenced Mycobacterium strains. The prophages are numerous and diverse, especially in M. abscessus genomes, and provide a potentially rich reservoir of new viruses that can be propagated lytically and used to expand the repertoire of therapeutically useful phages.
Collapse
Affiliation(s)
- Lawrence Abad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Christian H. Gauthier
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Isabella Florian
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Deborah Jacobs-Sera
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Graham F. Hatfull
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
143
|
Bryant P, Elofsson A. Peptide binder design with inverse folding and protein structure prediction. Commun Chem 2023; 6:229. [PMID: 37880344 PMCID: PMC10600234 DOI: 10.1038/s42004-023-01029-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 10/13/2023] [Indexed: 10/27/2023] Open
Abstract
The computational design of peptide binders towards a specific protein interface can aid diagnostic and therapeutic efforts. Here, we design peptide binders by combining the known structural space searched with Foldseek, the protein design method ESM-IF1, and AlphaFold2 (AF) in a joint framework. Foldseek generates backbone seeds for a modified version of ESM-IF1 adapted to protein complexes. The resulting sequences are evaluated with AF using an MSA representation for the receptor structure and a single sequence for the binder. We show that AF can accurately evaluate protein binders and that our bind score can select these (ROC AUC = 0.96 for the heterodimeric case). We find that designs created from seeds with more contacts per residue are more successful and tend to be short. There is a relationship between the sequence recovery in interface positions and the plDDT of the designs, where designs with ≥80% recovery have an average plDDT of 84 compared to 55 at 0%. Designed sequences have 60% higher median plDDT values towards intended receptors than non-intended ones. Successful binders (predicted interface RMSD ≤ 2 Å) are designed towards 185 (6.5%) heteromeric and 42 (3.6%) homomeric protein interfaces with ESM-IF1 compared with 18 (1.5%) using ProteinMPNN from 100 samples.
Collapse
Affiliation(s)
- Patrick Bryant
- Science for Life Laboratory, 172 21, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, 106 91, Stockholm, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, 172 21, Solna, Sweden.
- Department of Biochemistry and Biophysics, Stockholm University, 106 91, Stockholm, Sweden.
| |
Collapse
|
144
|
Zhu HT, Xia YH, Zhang GJ. E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning. J Chem Inf Model 2023; 63:6451-6461. [PMID: 37788318 DOI: 10.1021/acs.jcim.3c01387] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.
Collapse
Affiliation(s)
- Hai-Tao Zhu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
145
|
Sawa T, Moriwaki Y, Jiang H, Murase K, Takayama S, Shimizu K, Terada T. Comprehensive computational analysis of the SRK-SP11 molecular interaction underlying self-incompatibility in Brassicaceae using improved structure prediction for cysteine-rich proteins. Comput Struct Biotechnol J 2023; 21:5228-5239. [PMID: 37928947 PMCID: PMC10624595 DOI: 10.1016/j.csbj.2023.10.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 10/03/2023] [Accepted: 10/16/2023] [Indexed: 11/07/2023] Open
Abstract
Plants employ self-incompatibility (SI) to promote cross-fertilization. In Brassicaceae, this process is regulated by the formation of a complex between the pistil determinant S receptor kinase (SRK) and the pollen determinant S-locus protein 11 (SP11, also known as S-locus cysteine-rich protein, SCR). In our previous study, we used the crystal structures of two eSRK-SP11 complexes in Brassica rapa S8 and S9 haplotypes and nine computationally predicted complex models to demonstrate that only the SRK ectodomain (eSRK) and SP11 pairs derived from the same S haplotype exhibit high binding free energy. However, predicting the eSRK-SP11 complex structures for the other 100 + S haplotypes and genera remains difficult because of SP11 polymorphism in sequence and structure. Although protein structure prediction using AlphaFold2 exhibits considerably high accuracy for most protein monomers and complexes, 46% of the predicted SP11 structures that we tested showed < 75 mean per-residue confidence score (pLDDT). Here, we demonstrate that the use of curated multiple sequence alignment (MSA) for cysteine-rich proteins significantly improved model accuracy for SP11 and eSRK-SP11 complexes. Additionally, we calculated the binding free energies of the predicted eSRK-SP11 complexes using molecular dynamics (MD) simulations and observed that some Arabidopsis haplotypes formed a binding mode that was critically different from that of B. rapa S8 and S9. Thus, our computational results provide insights into the haplotype-specific eSRK-SP11 binding modes in Brassicaceae at the residue level. The predicted models are freely available at Zenodo, https://doi.org/10.5281/zenodo.8047768.
Collapse
Affiliation(s)
- Tomoki Sawa
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Yoshitaka Moriwaki
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Hanting Jiang
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kohji Murase
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Seiji Takayama
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Tohru Terada
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| |
Collapse
|
146
|
Dong W, Jiao B, Wang J, Sun L, Li S, Wu Z, Gao J, Zhou S. Genome-Wide Identification and Expression Analysis of Lipoxygenase Genes in Rose ( Rosa chinensis). Genes (Basel) 2023; 14:1957. [PMID: 37895306 PMCID: PMC10606720 DOI: 10.3390/genes14101957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/29/2023] Open
Abstract
Lipoxygenases (LOX) play pivotal roles in plant resistance to stresses. However, no study has been conducted on LOX gene identification at the whole genome scale in rose (Rosa chinensis). In this study, a total of 17 RcLOX members were identified in the rose genome. The members could be classified into three groups: 9-LOX, Type I 13-LOX, and Type II 13-LOX. Similar gene structures and protein domains can be found in RcLOX members. The RcLOX genes were spread among all seven chromosomes, with unbalanced distributions, and several tandem and proximal duplication events were found among RcLOX members. Expressions of the RcLOX genes were tissue-specific, while every RcLOX gene could be detected in at least one tissue. The expression levels of most RcLOX genes could be up-regulated by aphid infestation, suggesting potential roles in aphid resistance. Our study offers a systematic analysis of the RcLOX genes in rose, providing useful information not only for further gene cloning and functional exploration but also for the study of aphid resistance.
Collapse
Affiliation(s)
- Wenqi Dong
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture, China Agricultural University, Beijing 100193, China;
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Bo Jiao
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Jiao Wang
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Lei Sun
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Songshuo Li
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Zhiming Wu
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| | - Junping Gao
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture, China Agricultural University, Beijing 100193, China;
| | - Shuo Zhou
- Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
| |
Collapse
|
147
|
Derry A, Altman RB. Explainable protein function annotation using local structure embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562298. [PMID: 37905033 PMCID: PMC10614799 DOI: 10.1101/2023.10.13.562298] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method which combines pre-trained embeddings of local structural environments with traditional statistical techniques to identify enriched functions with residue-level explainability. For the task of predicting the catalytic function of enzymes, PARSE achieves comparable or superior global performance to state-of-the-art machine learning methods (F1 score > 85%) while simultaneously annotating the specific residues involved in each function with much greater precision. Since it does not require supervised training, our method can make one-shot predictions for very rare functions and is not limited to a particular type of functional label (e.g. Enzyme Commission numbers or Gene Ontology codes). Finally, we leverage the AlphaFold Structure Database to perform functional annotation at a proteome scale. By applying PARSE to the dark proteome-predicted structures which cannot be classified into known structural families-we predict several novel bacterial metalloproteases. Each of these proteins shares a strongly conserved catalytic site despite highly divergent sequences and global folds, illustrating the value of local structure representations for new function discovery.
Collapse
Affiliation(s)
- Alexander Derry
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Russ B Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA
- Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, CA
| |
Collapse
|
148
|
Sharma S, Gupta DN, Kushwah AS, Sharma AK, Prasad R. Identification and characterization of the Cyamopsis tetragonoloba transcription factor MYC (CtMYC) under drought stress. Gene 2023; 882:147654. [PMID: 37479095 DOI: 10.1016/j.gene.2023.147654] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/08/2023] [Accepted: 07/18/2023] [Indexed: 07/23/2023]
Abstract
The MYC transcription factor (TF) has a variety of roles in abiotic stress responses of plants. In the present work, MYC TF named CtMYC (Cymopsis tetragonoloba) from guar plant, which is induced by drought stress, was identified. The mature leaves of guar were employed to detect the full-length CtMYC TF on the 8th day of drought stress. The CtMYC gene showed tissue-specific expression and up regulated under drought stress conditions as compared to the control and maximum expression was observed in mature leaves. Additionally, CtMYC TF was cloned and expressed in E. coli Rosetta cells and CtMYC protein was purified. The circular dichroism (CD) analysis revealed the presence of helical content and beta sheets and in the presence of genomic DNA the conformational changes were observed in secondary structure, which showed DNA binding potential of CtMYC. These results were analyzed by CD and fluorescence studies. In silico studies reveal the presence of conserved bHLH domain and DNA-binding amino acid residues His, Glu and Arg in CtMYC. This is first report on CtMYC TF with DNA binding potential that is responsive to drought. This study provides the structure and characterization of CtMYC TF and DNA binding ability in drought tolerance mechanism in guar.
Collapse
Affiliation(s)
- Shipra Sharma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Deena Nath Gupta
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Ankita Singh Kushwah
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Ashwani Kumar Sharma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Ramasare Prasad
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India.
| |
Collapse
|
149
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
150
|
Kaminski K, Ludwiczak J, Pawlicki K, Alva V, Dunin-Horkawicz S. pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics 2023; 39:btad579. [PMID: 37725369 PMCID: PMC10576641 DOI: 10.1093/bioinformatics/btad579] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 07/09/2023] [Accepted: 09/15/2023] [Indexed: 09/21/2023] Open
Abstract
MOTIVATION The detection of homology through sequence comparison is a typical first step in the study of protein function and evolution. In this work, we explore the applicability of protein language models to this task. RESULTS We introduce pLM-BLAST, a tool inspired by BLAST, that detects distant homology by comparing single-sequence representations (embeddings) derived from a protein language model, ProtT5. Our benchmarks reveal that pLM-BLAST maintains a level of accuracy on par with HHsearch for both highly similar sequences (with >50% identity) and markedly divergent sequences (with <30% identity), while being significantly faster. Additionally, pLM-BLAST stands out among other embedding-based tools due to its ability to compute local alignments. We show that these local alignments, produced by pLM-BLAST, often connect highly divergent proteins, thereby highlighting its potential to uncover previously undiscovered homologous relationships and improve protein annotation. AVAILABILITY AND IMPLEMENTATION pLM-BLAST is accessible via the MPI Bioinformatics Toolkit as a web server for searching precomputed databases (https://toolkit.tuebingen.mpg.de/tools/plmblast). It is also available as a standalone tool for building custom databases and performing batch searches (https://github.com/labstructbioinf/pLM-BLAST).
Collapse
Affiliation(s)
- Kamil Kaminski
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw 02-097, Poland
| | - Jan Ludwiczak
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
| | - Kamil Pawlicki
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany
| | - Stanislaw Dunin-Horkawicz
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany
| |
Collapse
|