1
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. Protein Sci 2025; 34:e70004. [PMID: 39720898 DOI: 10.1002/pro.70004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 11/19/2024] [Accepted: 12/05/2024] [Indexed: 12/26/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.
Collapse
Affiliation(s)
| | - Amy E Keating
- Department of Biology, MIT, Cambridge, Massachusetts, USA
- Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA
- Koch Institute for Integrative Cancer Research, Cambridge, Massachusetts, USA
| |
Collapse
|
2
|
Mick ST, Carroll CL, Uriostegui-Arcos M, Fiszbein A. Hybrid exons evolved by coupling transcription initiation and splicing at the nucleotide level. Nucleic Acids Res 2024:gkae1251. [PMID: 39739742 DOI: 10.1093/nar/gkae1251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/27/2024] [Accepted: 12/05/2024] [Indexed: 01/02/2025] Open
Abstract
Exons within transcripts are traditionally classified as first, internal or last exons, each governed by different regulatory mechanisms. We recently described the widespread usage of 'hybrid' exons that serve as terminal or internal exons in different transcripts. Here, we employ an interpretable deep learning pipeline to dissect the sequence features governing the co-regulation of transcription initiation and splicing in hybrid exons. Using ENCODE data from human tissues, we identified 80 000 hybrid first-internal exons. These exons often possess a relaxed chromatin state, allowing transcription initiation within the gene body. Interestingly, transcription start sites of hybrid exons are typically centered at the 3' splice site, suggesting tight coupling between splicing and transcription initiation. We identified two subcategories of hybrid exons: the majority resemble internal exons, maintaining strong 3' splice sites, while a minority show enrichment in promoter elements, resembling first exons. Diving into the evolution of their sequences, we found that human hybrid exons with orthologous first exons in other species usually gained 3' splice sites or whole exons upstream, while those with orthologous internal exons often gained promoter elements. Overall, our findings unveil the intricate regulatory landscape of hybrid exons and reveal stronger connections between transcription initiation and RNA splicing than previously acknowledged.
Collapse
Affiliation(s)
- Steven T Mick
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | - Christine L Carroll
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | | | - Ana Fiszbein
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
- Computing & Data Sciences, Boston University, 665 Commonwealth Ave., Boston, 02215, USA
| |
Collapse
|
3
|
Thrift WJ, Lounsbury NW, Broadwell Q, Heidersbach A, Freund E, Abdolazimi Y, Phung QT, Chen J, Capietto AH, Tong AJ, Rose CM, Blanchette C, Lill JR, Haley B, Delamarre L, Bourgon R, Liu K, Jhunjhunwala S. Towards designing improved cancer immunotherapy targets with a peptide-MHC-I presentation model, HLApollo. Nat Commun 2024; 15:10752. [PMID: 39737928 DOI: 10.1038/s41467-024-54887-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/25/2024] [Indexed: 01/01/2025] Open
Abstract
Based on the success of cancer immunotherapy, personalized cancer vaccines have emerged as a leading oncology treatment. Antigen presentation on MHC class I (MHC-I) is crucial for the adaptive immune response to cancer cells, necessitating highly predictive computational methods to model this phenomenon. Here, we introduce HLApollo, a transformer-based model for peptide-MHC-I (pMHC-I) presentation prediction, leveraging the language of peptides, MHC, and source proteins. HLApollo provides end-to-end treatment of MHC-I sequences and deconvolution of multi-allelic data, using a negative-set switching strategy to mitigate misassigned negatives in unlabelled ligandome data. HLApollo shows a 12.65% increase in average precision (AP) on ligandome data and a 4.1% AP increase on immunogenicity test data compared to next-best models. Incorporating protein features from protein language models yields further gains and reduces the need for gene expression measurements. Guided by clinical use, we demonstrate pan-allelic generalization which effectively captures rare alleles in underrepresented ancestries.
Collapse
Affiliation(s)
- William John Thrift
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA
| | | | - Quade Broadwell
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA
| | - Amy Heidersbach
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Emily Freund
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Yassan Abdolazimi
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Qui T Phung
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | - Jieming Chen
- Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
| | | | - Ann-Jay Tong
- Cancer Immunology, Genentech, South San Francisco, CA, USA
| | - Christopher M Rose
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | | | - Jennie R Lill
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | - Benjamin Haley
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | | | - Richard Bourgon
- Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
- Computational Science, Freenome, South San Francisco, CA, USA
| | - Kai Liu
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA.
- Artificial Intelligence, SES AI, Woburn, MA, USA.
| | | |
Collapse
|
4
|
Strayer EC, Krishna S, Lee H, Vejnar C, Neuenkirchen N, Gupta A, Beaudoin JD, Giraldez AJ. NaP-TRAP reveals the regulatory grammar in 5'UTR-mediated translation regulation during zebrafish development. Nat Commun 2024; 15:10898. [PMID: 39738051 DOI: 10.1038/s41467-024-55274-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 12/06/2024] [Indexed: 01/01/2025] Open
Abstract
The cis-regulatory elements encoded in an mRNA determine its stability and translational output. While there has been a considerable effort to understand the factors driving mRNA stability, the regulatory frameworks governing translational control remain more elusive. We have developed a novel massively parallel reporter assay (MPRA) to measure mRNA translation, named Nascent Peptide Translating Ribosome Affinity Purification (NaP-TRAP). NaP-TRAP measures translation in a frame-specific manner through the immunocapture of epitope tagged nascent peptides of reporter mRNAs. We benchmark NaP-TRAP to polysome profiling and use it to quantify Kozak strength and the regulatory landscapes of 5' UTRs in the developing zebrafish embryo and in human cells. Through this approach we identified general and developmentally dynamic cis-regulatory elements, as well as potential trans-acting proteins. We find that U-rich motifs are general enhancers, and upstream ORFs and GC-rich motifs are global repressors of translation. We also observe a translational switch during the maternal-to-zygotic transition, where C-rich motifs shift from repressors to prominent activators of translation. Conversely, we show that microRNA sites in the 5' UTR repress translation following the zygotic expression of miR-430. Together these results demonstrate that NaP-TRAP is a versatile, accessible, and powerful method to decode the regulatory functions of UTRs across different systems.
Collapse
Affiliation(s)
- Ethan C Strayer
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Srikar Krishna
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Haejeong Lee
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Charles Vejnar
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Nils Neuenkirchen
- Department of Cell Biology, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Amit Gupta
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| | - Jean-Denis Beaudoin
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA.
- Yale Center for RNA Science and Medicine, Yale University, New Haven, 06510, CT, USA.
| | - Antonio J Giraldez
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA.
- Yale Center for RNA Science and Medicine, Yale University, New Haven, 06510, CT, USA.
- Yale Stem Cell Center, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA.
| |
Collapse
|
5
|
Lopez SC, Lee Y, Zhang K, Shipman SL. SspA is a transcriptional regulator of CRISPR adaptation in E. coli. Nucleic Acids Res 2024:gkae1244. [PMID: 39727179 DOI: 10.1093/nar/gkae1244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/23/2024] [Accepted: 12/04/2024] [Indexed: 12/28/2024] Open
Abstract
The CRISPR integrases Cas1-Cas2 create immunological memories of viral infection by storing phage-derived DNA in CRISPR arrays, a process known as CRISPR adaptation. A number of host factors have been shown to influence adaptation, but the full pathway from infection to a fully integrated, phage-derived sequences in the array remains incomplete. Here, we deploy a new CRISPRi-based screen to identify putative host factors that participate in CRISPR adaptation in the Escherichia coli Type I-E system. Our screen and subsequent mechanistic characterization reveal that SspA, through its role as a global transcriptional regulator of cellular stress, is required for functional CRISPR adaptation. One target of SspA is H-NS, a known repressor of CRISPR interference proteins, but we find that the role of SspA on adaptation is not H-NS-dependent. We propose a new model of CRISPR-Cas defense that includes independent cellular control of adaptation and interference by SspA.
Collapse
Affiliation(s)
- Santiago C Lopez
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Graduate Program in Bioengineering, University of California, San Francisco and Berkeley, 1700 Fourth St, San Francisco, CA 94158, USA
| | - Yumie Lee
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
| | - Karen Zhang
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Graduate Program in Bioengineering, University of California, San Francisco and Berkeley, 1700 Fourth St, San Francisco, CA 94158, USA
| | - Seth L Shipman
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 600 16th Street, San Francisco, CA CA94158, USA
- Chan Zuckerberg Biohub San Francisco,, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|
6
|
Du H, Mallik L, Hwang D, Sun Y, Kaku C, Hoces D, Sun SM, Ghinnagow R, Carro SD, Phan HAT, Gupta S, Blackson W, Lee H, Choe CA, Dersh D, Liu J, Bell B, Yang H, Papadaki GF, Young MC, Zhou E, El Nesr G, Goli KD, Eisenlohr LC, Minn AJ, Hernandez-Lopez RA, Jardine JG, Sgourakis NG, Huang PS. Targeting peptide antigens using a multiallelic MHC I-binding system. Nat Biotechnol 2024:10.1038/s41587-024-02505-8. [PMID: 39672954 DOI: 10.1038/s41587-024-02505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 11/13/2024] [Indexed: 12/15/2024]
Abstract
Identifying highly specific T cell receptors (TCRs) or antibodies against epitopic peptides presented by class I major histocompatibility complex (MHC I) proteins remains a bottleneck in the development of targeted therapeutics. Here, we introduce targeted recognition of antigen-MHC complex reporter for MHC I (TRACeR-I), a generalizable platform for targeting peptides on polymorphic HLA-A*, HLA-B* and HLA-C* allotypes while overcoming the cross-reactivity challenges of TCRs. Our TRACeR-MHC I co-crystal structure reveals a unique antigen recognition mechanism, with TRACeR forming extensive contacts across the entire peptide length to confer single-residue specificity at the accessible positions. We demonstrate rapid screening of TRACeR-I against a panel of disease-relevant HLAs with peptides derived from human viruses (human immunodeficiency virus, Epstein-Barr virus and severe acute respiratory syndrome coronavirus 2), and oncoproteins (Kirsten rat sarcoma virus, paired-like homeobox 2b and New York esophageal squamous cell carcinoma 1). TRACeR-based bispecific T cell engagers and chimeric antigen receptor T cells exhibit on-target killing of tumor cells with high efficacy in the low nanomolar range. Our platform empowers the development of broadly applicable MHC I-targeting molecules for research, diagnostic and therapeutic applications.
Collapse
Affiliation(s)
- Haotian Du
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Leena Mallik
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel Hwang
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi Sun
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Chengzi Kaku
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Daniel Hoces
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Shirley M Sun
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Cancer Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Reem Ghinnagow
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Stephen D Carro
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hoang Anh T Phan
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sagar Gupta
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wyatt Blackson
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | - Hyejin Lee
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Christian A Choe
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Devin Dersh
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingjia Liu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Braxton Bell
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Hongli Yang
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Georgia F Papadaki
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael C Young
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Emily Zhou
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Gina El Nesr
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Kimia Dasteh Goli
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Laurence C Eisenlohr
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Andy J Minn
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rogelio A Hernandez-Lopez
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
| | - Joseph G Jardine
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Nikolaos G Sgourakis
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Po-Ssu Huang
- Department of Chemistry, Stanford University, Stanford, CA, USA.
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Biophysics Program, Stanford University, Stanford, CA, USA.
| |
Collapse
|
7
|
Kohl F, Laufkötter O, Firth M, Krimpenfort L, Mangla P, Ansarizadeh M, Geylan G, Eklund L, De Maria L, Jakobsson L, Wiseman J. Identification of cell type-specific cell-penetrating peptides through in vivo phage display leveraged by next generation sequencing. Biomed Pharmacother 2024; 182:117740. [PMID: 39671725 DOI: 10.1016/j.biopha.2024.117740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 11/18/2024] [Accepted: 12/03/2024] [Indexed: 12/15/2024] Open
Abstract
Vascular anomalies (VA) refer to abnormal blood or lymphatic vessel architecture, most often as a result of dysregulated growth. Venous malformations (VM), a subgroup of VAs, are triggered by activating mutations in the Angiopoietin/TIE2-PI3K/AKT/mTOR signaling pathway with TIE2 L914F (gene name TEK) being one of the most frequent mutations in patients with VMs. Although systemic targeting of the overactivated pathway is possible, it would be a therapeutic advantage to restrict treatment to only the affected lesions. To identify peptides with potential selective binding to TIE2 L914F lesions we applied in vivo phage display to TIE2 L914F-overexpressing endothelial cells (ECs) in a subcutaneous matrigel xenograft mouse model of VMs. By panning for lesion-targeting phages in combination with subcellular fractionation, a screen for cell-penetrating candidate phages was established. Employing Next Generation Sequencing (NGS) and a refined bioinformatic analysis we were able to identify many novel cell-penetrating peptides (CPPs). To pinpoint the most selective and viable CCP candidates a hierarchical clustering algorithm was utilized. This method aggregated CPPs with highly similar sequences into a small number of clusters from which consensus sequences could be derived. Selected candidate CPPs exhibited uptake in TIE2 L914F-expressing human umbilical vein endothelial cells (HUVEC) in culture and were able to deliver siRNA into these cells. In conclusion, our NGS bioinformatic-supported approach led to the identification of novel and selective CPPs capable of transporting a siRNA cargo into targeted cells.
Collapse
Affiliation(s)
- Franziska Kohl
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden; Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Oliver Laufkötter
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| | - Mike Firth
- Data Sciences and Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Luc Krimpenfort
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Priyanka Mangla
- Oligonucleotides and Targeted Delivery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Mohammadhassan Ansarizadeh
- Oulu Center for Cell-Matrix Research, University of Oulu, Oulu, Finland; Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland; Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Gökçe Geylan
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden; Division of Systems and Synthetic Biology, Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Lauri Eklund
- Oulu Center for Cell-Matrix Research, University of Oulu, Oulu, Finland; Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland; Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Leonardo De Maria
- Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Lars Jakobsson
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - John Wiseman
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
8
|
Guerri F, Junet V, Farrés J, Daura X. MMPred: a tool to predict peptide mimicry events in MHC class II recognition. Front Genet 2024; 15:1500684. [PMID: 39722794 PMCID: PMC11669352 DOI: 10.3389/fgene.2024.1500684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Accepted: 11/25/2024] [Indexed: 12/28/2024] Open
Abstract
We present MMPred, a software tool that integrates epitope prediction and sequence alignment algorithms to streamline the computational analysis of molecular mimicry events in autoimmune diseases. Starting with two protein or peptide sets (e.g., from human and SARS-CoV-2), MMPred facilitates the generation, investigation, and testing of mimicry hypotheses by providing epitope predictions specifically for MHC class II alleles, which are frequently implicated in autoimmunity. However, the tool is easily extendable to MHC class I predictions by incorporating pre-trained models from CNN-PepPred and NetMHCpan. To evaluate MMPred's ability to produce biologically meaningful insights, we conducted a comprehensive assessment involving i) predicting associations between known HLA class II human autoepitopes and microbial-peptide mimicry, ii) interpreting these predictions within a systems biology framework to identify potential functional links between the predicted autoantigens and pathophysiological pathways related to autoimmune diseases, and iii) analyzing illustrative cases in the context of SARS-CoV-2 infection and autoimmunity. MMPred code and user guide are made freely available at https://github.com/ComputBiol-IBB/MMPRED.
Collapse
Affiliation(s)
- Filippo Guerri
- Anaxomics Biotech, Barcelona, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Valentin Junet
- Anaxomics Biotech, Barcelona, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | | | - Xavier Daura
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Bioingeniería, Biomateriales y Nanomedicina, Instituto de Salud Carlos III, Cerdanyola del Vallès, Spain
| |
Collapse
|
9
|
Mariani D, Setti A, Castagnetti F, Vitiello E, Stufera Mecarelli L, Di Timoteo G, Giuliani A, D’Angelo A, Santini T, Perego E, Zappone S, Liessi N, Armirotti A, Vicidomini G, Bozzoni I. ALS-associated FUS mutation reshapes the RNA and protein composition of stress granules. Nucleic Acids Res 2024; 52:13269-13289. [PMID: 39494508 PMCID: PMC11602144 DOI: 10.1093/nar/gkae942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 10/02/2024] [Accepted: 10/29/2024] [Indexed: 11/05/2024] Open
Abstract
Stress granules (SG) are part of a cellular protection mechanism where untranslated messenger RNAs and RNA-binding proteins are stored upon conditions of cellular stress. Compositional variations due to qualitative or quantitative protein changes can disrupt their functionality and alter their structure. This is the case of different forms of amyotrophic lateral sclerosis (ALS) where a causative link has been proposed between the cytoplasmic de-localization of mutant proteins, such as FUS (Fused in Sarcoma), and the formation of cytotoxic inclusions. Here, we describe the SG transcriptome in neuroblastoma cells and define several features for RNA recruitment in these condensates. We demonstrate that SG dynamics and RNA content are strongly modified by the incorporation of mutant FUS, switching to a more unstructured, AU-rich SG transcriptome. Moreover, we show that mutant FUS, together with its protein interactors and their target RNAs, are responsible for the reshaping of the mutant SG transcriptome with alterations that can be linked to neurodegeneration. Our data describe the molecular differences between physiological and pathological SG in ALS-FUS conditions, showing how FUS mutations impact the RNA and protein composition of these condensates.
Collapse
Affiliation(s)
- Davide Mariani
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Adriano Setti
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Francesco Castagnetti
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Erika Vitiello
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Lorenzo Stufera Mecarelli
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Gaia Di Timoteo
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Andrea Giuliani
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Angelo D’Angelo
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Tiziana Santini
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Eleonora Perego
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Sabrina Zappone
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Nara Liessi
- Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
| | - Andrea Armirotti
- Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
| | - Giuseppe Vicidomini
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Irene Bozzoni
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
- Center for Life Nano-& Neuro-Science, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
| |
Collapse
|
10
|
Gralak AJ, Faltejskova K, Yang AW, Steiner C, Russeil J, Grenningloh N, Inukai S, Demir M, Dainese R, Owen C, Pankevich E, Hughes TR, Kulakovskiy IV, Kribelbauer-Swietek JF, van Mierlo G, Deplancke B. Identification of methylation-sensitive human transcription factors using meSMiLE-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619598. [PMID: 39605503 PMCID: PMC11601298 DOI: 10.1101/2024.11.11.619598] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Transcription factors (TFs) are key players in eukaryotic gene regulation, but the DNA binding specificity of many TFs remains unknown. Here, we assayed 284 mostly poorly characterized, putative human TFs using selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq), revealing 72 new DNA binding motifs. To investigate whether some of the 158 TFs for which we did not find motifs preferably bind epigenetically modified DNA (i.e. methylated CG dinucleotides), we developed methylation-sensitive SMiLE-seq (meSMiLE-seq). This microfluidic assay simultaneously probes the affinity of a protein to methylated and unmethylated DNA, augmenting the capabilities of the original method to infer methylation-aware binding sites. We assayed 114 TFs with meSMiLE-seq and identified DNA-binding models for 48 proteins, including the known methylation-sensitive binding modes for POU5F1 and RFX5. For 11 TFs, binding to methylated DNA was preferred or resulted in the discovery of alternative, methylation-dependent motifs (e.g. PRDM13), while aversion towards methylated sequences was found for 13 TFs (e.g. USF3). Finally, we uncovered a potential role for ZHX2 as a putative binder of Z-DNA, a left-handed helical DNA structure which is adopted more frequently upon CpG methylation. Altogether, our study significantly expands the human TF codebook by identifying DNA binding motifs for 98 TFs, while providing a versatile platform to quantitatively assay the impact of DNA modifications on TF binding.
Collapse
Affiliation(s)
- Antoni J. Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Katerina Faltejskova
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | | | - Clemence Steiner
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Julie Russeil
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nadia Grenningloh
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sachi Inukai
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mustafa Demir
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Riccardo Dainese
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Cooper Owen
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Eugenia Pankevich
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | | | - Ivan V. Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Judith F. Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Guido van Mierlo
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Medical BioSciences, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
11
|
Mindel V, Brodsky S, Yung H, Manadre W, Barkai N. Revisiting the model for coactivator recruitment: Med15 can select its target sites independent of promoter-bound transcription factors. Nucleic Acids Res 2024; 52:12093-12111. [PMID: 39187372 PMCID: PMC11551773 DOI: 10.1093/nar/gkae718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/08/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024] Open
Abstract
Activation domains (ADs) within transcription factors (TFs) induce gene expression by recruiting coactivators such as the Mediator complex. Coactivators lack DNA binding domains (DBDs) and are assumed to passively follow their recruiting TFs. This is supported by direct AD-coactivator interactions seen in vitro but has not yet been tested in living cells. To examine that, we targeted two Med15-recruiting ADs to a range of budding yeast promoters through fusion with different DBDs. The DBD-AD fusions localized to hundreds of genomic sites but recruited Med15 and induced transcription in only a subset of bound promoters, characterized by a fuzzy-nucleosome architecture. Direct DBD-Med15 fusions shifted DBD localization towards fuzzy-nucleosome promoters, including promoters devoid of the endogenous Mediator. We propose that Med15, and perhaps other coactivators, possess inherent promoter preference and thus actively contribute to the selection of TF-induced genes.
Collapse
Affiliation(s)
- Vladimir Mindel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hadas Yung
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Wajd Manadre
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
12
|
Ndjite GM, Jiang A, Ravel C, Grant M, Jiang X, Hall B. Gut Microbial Utilization of the Alternative Sweetener, D-Allulose, via AlsE. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.07.622513. [PMID: 39574671 PMCID: PMC11580995 DOI: 10.1101/2024.11.07.622513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2024]
Abstract
D-allulose, a rare sugar with emerging potential as a low-calorie sweetener, has garnered attention as an alternative to other commercially available alternative sweeteners, such as sugar alcohols, which often cause severe gastrointestinal discomfort. D-allulose-6-phosphate 3-epimerase (AlsE) is a prokaryotic enzyme that converts D-allulose-6-phosphate into D-fructose-6-phopshate, enabling its use as a carbon source. However, the taxonomic breadth of AlsE across gut bacteria remains poorly understood, hindering insights into the utilization of D-allulose by microbial communities. In this study, we provide experimental evidence showing that Clostridium innocuum is capable of D-allulose metabolism via a homologous AlsE. A bioinformatics search of 85,202 bacterial genomes identified 116 bacterial species with AlsE homologs, suggesting a limited distribution of AlsE in bacteria. Additionally, Escherichia coli contains a copy of alsE , but it does not grow on D-allulose as a sole carbon source unless alsE is heterologously expressed. A metagenomic analysis revealed that 15.8% of 3,079 adult healthy human metagenomic samples that we analyzed contained alsE , suggesting a limited prevalence of the enzyme in the gut microbiome. These results suggest that the gut microbiome has limited capacity to metabolize D-allulose via alsE , supporting its use as an alternative sweetener with minimal impact on microbial composition and gastrointestinal symptoms. This finding also enables personalized nutrition, allowing diabetic individuals to assess their gut microbiota for alsE , and manage glycemic response while reducing gastrointestinal distress.
Collapse
Affiliation(s)
- Glory Minabou Ndjite
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Angela Jiang
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Charlotte Ravel
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Maggie Grant
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Xiaofang Jiang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Brantley Hall
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, Maryland, USA
| |
Collapse
|
13
|
Ballmer D, Lou HJ, Ishii M, Turk BE, Akiyoshi B. Aurora B controls anaphase onset and error-free chromosome segregation in trypanosomes. J Cell Biol 2024; 223:e202401169. [PMID: 39196069 PMCID: PMC11354203 DOI: 10.1083/jcb.202401169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/12/2024] [Accepted: 07/25/2024] [Indexed: 08/29/2024] Open
Abstract
Kinetochores form the interface between chromosomes and spindle microtubules and are thus under tight control by a complex regulatory circuitry. The Aurora B kinase plays a central role within this circuitry by destabilizing improper kinetochore-microtubule attachments and relaying the attachment status to the spindle assembly checkpoint. Intriguingly, Aurora B is conserved even in kinetoplastids, a group of early-branching eukaryotes which possess a unique set of kinetochore proteins. It remains unclear how their kinetochores are regulated to ensure faithful chromosome segregation. Here, we show in Trypanosoma brucei that Aurora B activity controls the metaphase-to-anaphase transition through phosphorylation of the divergent Bub1-like protein KKT14. Depletion of KKT14 overrides the metaphase arrest resulting from Aurora B inhibition, while expression of non-phosphorylatable KKT14 delays anaphase onset. Finally, we demonstrate that re-targeting Aurora B to the outer kinetochore suffices to promote mitotic exit but causes extensive chromosome missegregation in anaphase. Our results indicate that Aurora B and KKT14 are involved in an unconventional circuitry controlling cell cycle progression in trypanosomes.
Collapse
Affiliation(s)
- Daniel Ballmer
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Hua Jane Lou
- Department of Pharmacology, Yale School of Medicine, New Haven, CT, USA
| | - Midori Ishii
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Benjamin E. Turk
- Department of Pharmacology, Yale School of Medicine, New Haven, CT, USA
| | - Bungo Akiyoshi
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
14
|
Kayrouz CM, Ireland KA, Ying VY, Davis KM, Seyedsayamdost MR. Discovery of the selenium-containing antioxidant ovoselenol derived from convergent evolution. Nat Chem 2024; 16:1868-1875. [PMID: 39143299 DOI: 10.1038/s41557-024-01600-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 07/11/2024] [Indexed: 08/16/2024]
Abstract
Selenium is an essential micronutrient, but its presence in biology has been limited to protein and nucleic acid biopolymers. The recent identification of a biosynthetic pathway for selenium-containing small molecules suggests that there is a larger family of selenometabolites that remains to be discovered. Here we identify a recently evolved branch of abundant and uncharacterized metalloenzymes that we predict are involved in selenometabolite biosynthesis using a bioinformatic search strategy that relies on the mapping of composite active site motifs. Biochemical studies confirm this prediction and show that these enzymes form an unusual C-Se bond onto histidine, thus giving rise to a distinct selenometabolite and potent antioxidant that we have termed ovoselenol. Aside from providing insights into the evolution of this enzyme class and the structural basis of C-Se bond formation, our work offers a blueprint for charting the microbial selenometabolome in the future.
Collapse
Affiliation(s)
- Chase M Kayrouz
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | | | - Vanessa Y Ying
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | | | - Mohammad R Seyedsayamdost
- Department of Chemistry, Princeton University, Princeton, NJ, USA.
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
15
|
Le TNY, Le CT, Nguyen TA. Determinants of selectivity in the dicing mechanism. Nat Commun 2024; 15:8989. [PMID: 39420173 PMCID: PMC11487123 DOI: 10.1038/s41467-024-53322-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 10/07/2024] [Indexed: 10/19/2024] Open
Abstract
Our research elucidates the cleavage processes of the RNase III enzyme, DICER, which plays a crucial role in the production of small RNAs, such as microRNAs (miRNAs) and small interfering RNAs (siRNAs). Utilizing high-throughput dicing assays, we expose the bipartite pairing rule that dictates the cleavage sites of DICER. Furthermore, we decode the intricate recognition mechanism of the primary YCR motif and identify an analogous secondary YCR motif that influences DICER's cleavage choices. Collectively, our findings clarify the bipartite pairing rule and enhance our understanding of the role of RNA motifs in modulating DICER's cleavage activity, laying the groundwork for future research on their roles in miRNA biogenesis and gene regulation.
Collapse
Affiliation(s)
- Thi Nhu-Y Le
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China
| | - Cong Truc Le
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China
| | - Tuan Anh Nguyen
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China.
| |
Collapse
|
16
|
Liew D, Lim ZW, Yong EH. Machine learning-based prediction of DNA G-quadruplex folding topology with G4ShapePredictor. Sci Rep 2024; 14:24238. [PMID: 39414858 PMCID: PMC11484705 DOI: 10.1038/s41598-024-74826-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/30/2024] [Indexed: 10/18/2024] Open
Abstract
Deoxyribonucleic acid (DNA) is able to form non-canonical four-stranded helical structures with diverse folding patterns known as G-quadruplexes (G4s). G4 topologies are classified based on their relative strand orientation following the 5' to 3' phosphate backbone polarity. Broadly, G4 topologies are either parallel (4+0), antiparallel (2+2), or hybrid (3+1). G4s play crucial roles in biological processes such as DNA repair, DNA replication, transcription and have thus emerged as biological targets in drug design. While computational models have been developed to predict G4 formation, there is currently no existing model capable of predicting G4 folding topology based on its nucleic acid sequence. Therefore, we introduce G4ShapePredictor (G4SP), an application featuring a collection of multi-classification machine learning models that are trained on a custom G4 dataset combining entries from existing literature and in-house circular dichroism experiments. G4ShapePredictor is designed to accurately predict G4 folding topologies in potassium ( K + ) buffer based on its primary sequence and is able to incorporate a threshold optimization strategy allowing users to maximise precision. Furthermore, we have identified three topological sequence motifs that suggest specific G4 folding topologies of (4+0), (2+2) or (3+1) when utilising the decision-making mechanisms of G4ShapePredictor.
Collapse
Affiliation(s)
- Donn Liew
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore
| | - Zi Way Lim
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore
| | - Ee Hou Yong
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore.
| |
Collapse
|
17
|
Yue T, Chen SY, Shen WK, Zhang ZY, Cheng L, Guo AY. TCRosetta: An Integrated Analysis and Annotation Platform for T-cell Receptor Sequences. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae013. [PMID: 39436242 DOI: 10.1093/gpbjnl/qzae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 12/23/2023] [Accepted: 01/08/2024] [Indexed: 10/23/2024]
Abstract
T cells and T-cell receptors (TCRs) are essential components of the adaptive immune system. Characterization of the TCR repertoire offers a promising and highly informative source for understanding the functions of T cells in the immune response and immunotherapy. Although TCR repertoire studies have attracted much attention, there are few online servers available for TCR repertoire analysis, especially for TCR sequence annotation or advanced analyses. Therefore, we developed TCRosetta, a comprehensive online server that integrates analytical methods for TCR repertoire analysis and visualization. TCRosetta combines general feature analysis, large-scale sequence clustering, network construction, peptide-TCR binding prediction, generation probability calculation, and k-mer motif analysis for TCR sequences, making TCR data analysis as simple as possible. The TCRosetta server accepts multiple input data formats and can analyze ∼ 20,000 TCR sequences in less than 3 min. TCRosetta is the most comprehensive web server available for TCR repertoire analysis and is freely available at https://guolab.wchscu.cn/TCRosetta/.
Collapse
Affiliation(s)
- Tao Yue
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Si-Yi Chen
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Wen-Kang Shen
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhan-Ye Zhang
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Liming Cheng
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - An-Yuan Guo
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
- Department of Thoracic Surgery, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
18
|
Wang Y, Lv H, Teo QW, Lei R, Gopal AB, Ouyang WO, Yeung YH, Tan TJC, Choi D, Shen IR, Chen X, Graham CS, Wu NC. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies. Immunity 2024; 57:2453-2465.e7. [PMID: 39163866 PMCID: PMC11464180 DOI: 10.1016/j.immuni.2024.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 04/24/2024] [Accepted: 07/24/2024] [Indexed: 08/22/2024]
Abstract
Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and the inaccessibility of datasets for model training. In this study, we curated >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM could identify key sequence features of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of the antibody response to the influenza virus but also provides a valuable resource for applying deep learning to antibody research.
Collapse
Affiliation(s)
- Yiquan Wang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Huibin Lv
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Qi Wen Teo
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ruipeng Lei
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Akshita B Gopal
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Wenhao O Ouyang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Yuen-Hei Yeung
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Timothy J C Tan
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Danbi Choi
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ivana R Shen
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Xin Chen
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Claire S Graham
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Nicholas C Wu
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
19
|
Wilkinson ME, Li D, Gao A, Macrae RK, Zhang F. Phage-triggered reverse transcription assembles a toxic repetitive gene from a noncoding RNA. Science 2024; 386:eadq3977. [PMID: 39208082 DOI: 10.1126/science.adq3977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Reverse transcription has frequently been co-opted for cellular functions and in prokaryotes is associated with protection against viral infection, but the underlying mechanisms of defense are generally unknown. Here, we show that in the DRT2 defense system, the reverse transcriptase binds a neighboring pseudoknotted noncoding RNA. Upon bacteriophage infection, a template region of this RNA is reverse transcribed into an array of tandem repeats that reconstitute a promoter and open reading frame, allowing expression of a toxic repetitive protein and an abortive infection response. Biochemical reconstitution of this activity and cryo-electron microscopy provide a molecular basis for repeat synthesis. Gene synthesis from a noncoding RNA is a previously unknown mode of genetic regulation in prokaryotes.
Collapse
Affiliation(s)
- Max E Wilkinson
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David Li
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Alex Gao
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Rhiannon K Macrae
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Feng Zhang
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
20
|
Chen WC, Zhou J, McCandlish DM. Density estimation for ordinal biological sequences and its applications. Phys Rev E 2024; 110:044408. [PMID: 39562961 PMCID: PMC11605730 DOI: 10.1103/physreve.110.044408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 10/03/2024] [Indexed: 11/21/2024]
Abstract
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.
Collapse
Affiliation(s)
- Wei-Chia Chen
- Department of Physics, National Chung Cheng University, Chiayi 62102, Taiwan, R.O.C
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, Florida 32611, U.S.A
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, U.S.A
| |
Collapse
|
21
|
Prince CR, Lin IN, Feaga HA. The evolution and functional significance of the programmed ribosomal frameshift in prfB. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614795. [PMID: 39386688 PMCID: PMC11463598 DOI: 10.1101/2024.09.24.614795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Release Factor 2 (RF2) is one of two peptide release factors that terminate translation in bacteria. In Escherichia coli, the gene encoding RF2, prfB, contains an in-frame premature RF2-specific stop codon. Therefore, a programmed ribosomal frameshift is required to translate full-length RF2. Here, we investigate the diversity of prfB frameshifting through bioinformatic analyses of >12,000 genomes. We present evidence that prfB frameshifting autoregulates RF2 levels throughout the bacterial domain since (i) the prfB in-frame stop codon is always TGA or TAA, both of which are recognized by RF2, and never the RF1-specific TAG stop codon, and (ii) species that lack the autoregulatory programmed frameshift likely need higher RF2 levels since, on average, they have significantly higher RF2-specific stop codon usage. Overexpression of prfB without the autoregulatory frameshift motif is toxic to Bacillus subtilis, an organism with intermediate RF2-specific stop codon usage. We did not detect the programmed frameshift in any Actinobacteriota. Consistent with this finding, we observed very low frameshift efficiency at the prfB frameshift motif in the Actinobacterium Mycobacterium smegmatis. Our work provides a more complete picture of the evolution of the RF2 programmed frameshifting motif, and its usage to prevent toxic overexpression of RF2.
Collapse
Affiliation(s)
| | - Isabella N. Lin
- Department of Microbiology, Cornell University, Ithaca, NY 14853
| | - Heather A. Feaga
- Department of Microbiology, Cornell University, Ithaca, NY 14853
| |
Collapse
|
22
|
Tang Z, Somia N, Yu Y, Koo PK. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582810. [PMID: 38464101 PMCID: PMC10925287 DOI: 10.1101/2024.02.29.582810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged to improve predictive performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that probing the representations of pre-trained gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major gap with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Collapse
Affiliation(s)
- Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Nirali Somia
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Yiyang Yu
- The Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
23
|
Ghafoor H, Asim MN, Ibrahim MA, Dengel A. ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution. Heliyon 2024; 10:e36041. [PMID: 39281576 PMCID: PMC11401092 DOI: 10.1016/j.heliyon.2024.e36041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 08/08/2024] [Indexed: 09/18/2024] Open
Abstract
Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
24
|
Eliad B, Schneider N, Ben-Naim Zgayer O, Amichan Y, Glaser F, Erdmann EA, Rajendren S, Hundley HA, Lamm AT. ADBP-1 regulates ADR-2 nuclear localization to control editing substrate selection. Nucleic Acids Res 2024; 52:9501-9518. [PMID: 39036970 PMCID: PMC11381337 DOI: 10.1093/nar/gkae641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/05/2024] [Accepted: 07/09/2024] [Indexed: 07/23/2024] Open
Abstract
Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is a prevalent and conserved RNA modification. While A-to-I RNA editing is essential in mammals, in Caenorhabditis elegans, it is not, making them invaluable for RNA editing research. In C. elegans, ADR-2 is the sole catalytic A-to-I editing enzyme, and ADR-1 is an RNA editing regulator. ADAR localization is well-studied in humans but not well-established in C. elegans. In this study, we examine the cellular and tissue-specific localization of ADR-2. We show that while ADR-2 is present in most cells in the embryo, at later developmental stages, its expression is both tissue- and cell-type-specific. Additionally, both ADARs are mainly in the nucleus. ADR-2 is adjacent to the chromosomes during the cell cycle. We show that the nuclear localization of endogenous ADR-2 depends on ADBP-1, not ADR-1. In adbp-1 mutant worms, ADR-2 is mislocalized, while ADR-1 is not, leading to decreased editing levels and de-novo editing, mostly in exons, suggesting that ADR-2 is also functional in the cytoplasm. Besides, mutated ADBP-1 affects gene expression. Furthermore, we show that ADR-2 targets adenosines with different surrounding nucleotides in exons and introns. Our findings indicate that ADR-2 cellular localization is highly regulated and affects its function.
Collapse
Affiliation(s)
- Berta Eliad
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Noa Schneider
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Orna Ben-Naim Zgayer
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Yarden Amichan
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Fabian Glaser
- Technion Center for Structural Biology, Technion Human Health Initiative, Technion, Haifa 32000, Israel
| | - Emily A Erdmann
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Suba Rajendren
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Heather A Hundley
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Ayelet T Lamm
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| |
Collapse
|
25
|
Augustijn HE, Karapliafis D, Joosten KMM, Rigali S, van Wezel GP, Medema MH. LogoMotif: A Comprehensive Database of Transcription Factor Binding Site Profiles in Actinobacteria. J Mol Biol 2024; 436:168558. [PMID: 38580076 DOI: 10.1016/j.jmb.2024.168558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/28/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]
Abstract
Actinobacteria undergo a complex multicellular life cycle and produce a wide range of specialized metabolites, including the majority of the antibiotics. These biological processes are controlled by intricate regulatory pathways, and to better understand how they are controlled we need to augment our insights into the transcription factor binding sites. Here, we present LogoMotif (https://logomotif.bioinformatics.nl), an open-source database for characterized and predicted transcription factor binding sites in Actinobacteria, along with their cognate position weight matrices and hidden Markov models. Genome-wide predictions of binding site locations in Streptomyces model organisms are supplied and visualized in interactive regulatory networks. In the web interface, users can freely access, download and investigate the underlying data. With this curated collection of actinobacterial regulatory interactions, LogoMotif serves as a basis for binding site predictions, thus providing users with clues on how to elicit the expression of genes of interest and guide genome mining efforts.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Kristy M M Joosten
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Sébastien Rigali
- InBioS - Center for Protein Engineering, University of Liège, Institut de Chimie, B-4000 Liège, Belgium
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
26
|
Wang B, Mount S. Latent Dirichlet allocation mixture models for nucleotide sequence analysis. NAR Genom Bioinform 2024; 6:lqae099. [PMID: 39131816 PMCID: PMC11310860 DOI: 10.1093/nargab/lqae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 06/13/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Strings of nucleotides carrying biological information are typically described as sequence motifs represented by weight matrices or consensus sequences. However, many signals in DNA or RNA are recognized by multiple factors in temporal sequence, consist of distinct alternative motifs, or are best described by base composition. Here we apply the latent Dirichlet allocation (LDA) mixture model to nucleotide sequences. Using positions in an alignment of human or Drosophila splice sites as samples, we show that LDA readily identifies motifs, including such elusive cases as the intron branch site. Using whole sequences with positional k-mers as features, LDA can identify sequence subtypes enriched in long vs. short introns. LDA with bulk k-mers can reliably distinguish reading frame and species of origin in coding sequences from humans and Drosophila. We find that LDA is a useful model for describing heterogeneous signals, for assigning individual sequences to subtypes, and for identifying and characterizing sequences that do not fit recognized subtypes. Because LDA topic models are interpretable, they also aid the discovery of new motifs, even those present in a small fraction of samples. In summary, LDA can identify and characterize signals in nucleotide sequences, including candidate regulatory factors involved in biological processes.
Collapse
Affiliation(s)
- Bixuan Wang
- Dept. of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Stephen M Mount
- Dept. of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
27
|
Gul A, Pewe LL, Willems P, Mayer R, Thery F, Asselman C, Aernout I, Verbeke R, Eggermont D, Van Moortel L, Upton E, Zhang Y, Boucher K, Miret-Casals L, Demol H, De Smedt SC, Lentacker I, Radoshevich L, Harty JT, Impens F. Immunopeptidomics Mapping of Listeria monocytogenes T Cell Epitopes in Mice. Mol Cell Proteomics 2024; 23:100829. [PMID: 39147027 PMCID: PMC11414675 DOI: 10.1016/j.mcpro.2024.100829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 07/21/2024] [Accepted: 08/12/2024] [Indexed: 08/17/2024] Open
Abstract
Listeria monocytogenes is a foodborne intracellular bacterial model pathogen. Protective immunity against Listeria depends on an effective CD8+ T cell response, but very few T cell epitopes are known in mice as a common animal infection model for listeriosis. To identify epitopes, we screened for Listeria immunopeptides presented in the spleen of infected mice by mass spectrometry-based immunopeptidomics. We mapped more than 6000 mouse self-peptides presented on MHC class I molecules, including 12 high confident Listeria peptides from 12 different bacterial proteins. Bacterial immunopeptides with confirmed fragmentation spectra were further tested for their potential to activate CD8+ T cells, revealing VTYNYINI from the putative cell wall surface anchor family protein LMON_0576 as a novel bona fide peptide epitope. The epitope showed high biological potency in a prime boost model and can be used as a research tool to probe CD8+ T cell responses in the mouse models of Listeria infection. Together, our results demonstrate the power of immunopeptidomics for bacterial antigen identification.
Collapse
Affiliation(s)
- Adillah Gul
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lecia L Pewe
- Department of Pathology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA
| | - Patrick Willems
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB-UGent Center for Plant Systems Biology, VIB, Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Rupert Mayer
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Fabien Thery
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Caroline Asselman
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Ilke Aernout
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Rein Verbeke
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Denzel Eggermont
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Laura Van Moortel
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ellen Upton
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA
| | - Yifeng Zhang
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA
| | - Katie Boucher
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Laia Miret-Casals
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Hans Demol
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Stefaan C De Smedt
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Ine Lentacker
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Lilliana Radoshevich
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA; Department of Immunology and Genomic Medicine, National Jewish Health, Denver, Colorado, USA.
| | - John T Harty
- Department of Pathology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA.
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium.
| |
Collapse
|
28
|
Collesano L, Łuksza M, Lässig M. Energy landscapes of peptide-MHC binding. PLoS Comput Biol 2024; 20:e1012380. [PMID: 39226310 PMCID: PMC11398667 DOI: 10.1371/journal.pcbi.1012380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 09/13/2024] [Accepted: 07/31/2024] [Indexed: 09/05/2024] Open
Abstract
Molecules of the Major Histocompatibility Complex (MHC) present short protein fragments on the cell surface, an important step in T cell immune recognition. MHC-I molecules process peptides from intracellular proteins; MHC-II molecules act in antigen-presenting cells and present peptides derived from extracellular proteins. Here we show that the sequence-dependent energy landscapes of MHC-peptide binding encode class-specific nonlinearities (epistasis). MHC-I has a smooth landscape with global epistasis; the binding energy is a simple deformation of an underlying linear trait. This form of epistasis enhances the discrimination between strong-binding peptides. In contrast, MHC-II has a rugged landscape with idiosyncratic epistasis: binding depends on detailed amino acid combinations at multiple positions of the peptide sequence. The form of epistasis affects the learning of energy landscapes from training data. For MHC-I, a low-complexity problem, we derive a simple matrix model of binding energies that outperforms current models trained by machine learning. For MHC-II, higher complexity prevents learning by simple regression methods. Epistasis also affects the energy and fitness effects of mutations in antigen-derived peptides (epitopes). In MHC-I, large-effect mutations occur predominantly in anchor positions of strong-binding epitopes. In MHC-II, large effects depend on the background epitope sequence but are broadly distributed over the epitope, generating a bigger target for escape mutations due to loss of presentation. Together, our analysis shows how an energy landscape of protein-protein binding constrains the target of escape mutations from T cell immunity, linking the complexity of the molecular interactions to the dynamics of adaptive immune response.
Collapse
Affiliation(s)
- Laura Collesano
- Institute for Biological Physics, University of Cologne, Cologne, Germany
| | - Marta Łuksza
- Tisch Cancer Institute, Departments of Oncological Sciences and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Michael Lässig
- Institute for Biological Physics, University of Cologne, Cologne, Germany
| |
Collapse
|
29
|
Shrestha P, Kandel J, Tayara H, Chong KT. Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat Commun 2024; 15:6699. [PMID: 39107330 PMCID: PMC11303401 DOI: 10.1038/s41467-024-51071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/10/2024] Open
Abstract
Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.
Collapse
Affiliation(s)
- Palistha Shrestha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| |
Collapse
|
30
|
Gizzio J, Thakur A, Haldane A, Post CB, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. Nat Commun 2024; 15:6545. [PMID: 39095350 PMCID: PMC11297160 DOI: 10.1038/s41467-024-50812-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory folded conformation, due to intrinsic sequence effects. Here we investigate the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a thermodynamic cycle involving many (n = 108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation DFG-out Activation Loop Folded, is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Chemistry, Temple University, Philadelphia, PA, USA
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Chemistry, Temple University, Philadelphia, PA, USA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Physics, Temple University, Philadelphia, PA, USA
| | - Carol Beth Post
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA.
- Department of Chemistry, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
31
|
Rodriguez DCP, Weber KC, Sundberg B, Glasgow A. MAGPIE: An interactive tool for visualizing and analyzing protein-ligand interactions. Protein Sci 2024; 33:e5027. [PMID: 38989559 PMCID: PMC11237554 DOI: 10.1002/pro.5027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/22/2024] [Accepted: 05/05/2024] [Indexed: 07/12/2024]
Abstract
Quantitative tools to compile and analyze biomolecular interactions among chemically diverse binding partners would improve therapeutic design and aid in studying molecular evolution. Here we present Mapping Areas of Genetic Parsimony In Epitopes (MAGPIE), a publicly available software package for simultaneously visualizing and analyzing thousands of interactions between a single protein or small molecule ligand (the "target") and all of its protein binding partners ("binders"). MAGPIE generates an interactive three-dimensional visualization from a set of protein complex structures that share the target ligand, as well as sequence logo-style amino acid frequency graphs that show all the amino acids from the set of protein binders that interact with user-defined target ligand positions or chemical groups. MAGPIE highlights all the salt bridge and hydrogen bond interactions made by the target in the visualization and as separate amino acid frequency graphs. Finally, MAGPIE collates the most common target-binder interactions as a list of "hotspots," which can be used to analyze trends or guide the de novo design of protein binders. As an example of the utility of the program, we used MAGPIE to probe how different antibody fragments bind a viral antigen; how a common metabolite binds diverse protein partners; and how two ligands bind orthologs of a well-conserved glycolytic enzyme for a detailed understanding of evolutionarily conserved interactions involved in its activation and inhibition. MAGPIE is implemented in Python 3 and freely available at https://github.com/glasgowlab/MAGPIE, along with sample datasets, usage examples, and helper scripts to prepare input structures.
Collapse
Affiliation(s)
- Daniel C. Pineda Rodriguez
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Kyle C. Weber
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Belen Sundberg
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Anum Glasgow
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| |
Collapse
|
32
|
Singer A, Ramos A, Keating AE. Elaboration of the Homer1 recognition landscape reveals incomplete divergence of paralogous EVH1 domains. Protein Sci 2024; 33:e5094. [PMID: 38989636 PMCID: PMC11237882 DOI: 10.1002/pro.5094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 06/11/2024] [Accepted: 06/16/2024] [Indexed: 07/12/2024]
Abstract
Short sequences that mediate interactions with modular binding domains are ubiquitous throughout eukaryotic proteomes. Networks of short linear motifs (SLiMs) and their corresponding binding domains orchestrate many cellular processes, and the low mutational barrier to evolving novel interactions provides a way for biological systems to rapidly sample selectable phenotypes. Mapping SLiM binding specificity and the rules that govern SLiM evolution is fundamental to uncovering the pathways regulated by these networks and developing the tools to manipulate them. We used high-throughput screening of the human proteome to identify sequences that bind to the Enabled/VASP homology 1 (EVH1) domain of the postsynaptic density scaffolding protein Homer1. This expanded our understanding of the determinants of Homer EVH1 binding preferences and defined a new motif that can facilitate the discovery of additional Homer-mediated interactions. Interestingly, the Homer1 EVH1 domain preferentially binds to sequences containing an N-terminally overlapping motif that is bound by the paralogous family of Ena/VASP actin polymerases, and many of these sequences can bind to EVH1 domains from both protein families. We provide evidence from orthologous EVH1 domains in pre-metazoan organisms that the overlap in human Ena/VASP and Homer binding preferences corresponds to an incomplete divergence from a common Ena/VASP ancestor. Given this overlap in binding profiles, promiscuous sequences that can be recognized by both families either achieve specificity through extrinsic regulatory strategies or may provide functional benefits via multi-specificity. This may explain why these paralogs incompletely diverged despite the accessibility of further diverged isoforms.
Collapse
Affiliation(s)
- Avinoam Singer
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Alejandra Ramos
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Amy E. Keating
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Koch Institute for Integrative Cancer ResearchMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
33
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.23.604860. [PMID: 39091826 PMCID: PMC11291154 DOI: 10.1101/2024.07.23.604860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. The ability to predict domain-SLiM interactions would allow researchers to map protein interaction networks, predict the effects of perturbations to those networks, and develop biologically meaningful hypotheses. Unfortunately, sequence database searches for SLiMs generally yield mostly biologically irrelevant motif matches or false positives. To improve the prediction of novel SLiM interactions, researchers employ filters to discriminate between biologically relevant and improbable motif matches. One promising criterion for identifying biologically relevant SLiMs is the sequence conservation of the motif, exploiting the fact that functional motifs are more likely to be conserved than spurious motif matches. However, the difficulty of aligning disordered regions has significantly hampered the utility of this approach. We present PairK (pairwise k-mer alignment), an MSA-free method to quantify motif conservation in disordered regions. PairK outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor on the task of identifying biologically important motif instances. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that SLiMs may be more conserved than is implied by MSA-based metrics. PairK is available as open-source code at https://github.com/jacksonh1/pairk.
Collapse
Affiliation(s)
- Jackson C. Halpin
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
| | - Amy E. Keating
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
- MIT Department of Biological Engineering, 77 Massachusetts Ave., Cambridge, MA 02139
- Koch Institute for Integrative Cancer Research, 77 Massachusetts Ave., Cambridge, MA 02139
| |
Collapse
|
34
|
Szulc NA, Stefaniak F, Piechota M, Soszyńska A, Piórkowska G, Cappannini A, Bujnicki J, Maniaci C, Pokrzywa W. DEGRONOPEDIA: a web server for proteome-wide inspection of degrons. Nucleic Acids Res 2024; 52:W221-W232. [PMID: 38567734 PMCID: PMC11223883 DOI: 10.1093/nar/gkae238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 07/06/2024] Open
Abstract
E3 ubiquitin ligases recognize substrates through their short linear motifs termed degrons. While degron-signaling has been a subject of extensive study, resources for its systematic screening are limited. To bridge this gap, we developed DEGRONOPEDIA, a web server that searches for degrons and maps them to nearby residues that can undergo ubiquitination and disordered regions, which may act as protein unfolding seeds. Along with an evolutionary assessment of degron conservation, the server also reports on post-translational modifications and mutations that may modulate degron availability. Acknowledging the prevalence of degrons at protein termini, DEGRONOPEDIA incorporates machine learning to assess N-/C-terminal stability, supplemented by simulations of proteolysis to identify degrons in newly formed termini. An experimental validation of a predicted C-terminal destabilizing motif, coupled with the confirmation of a post-proteolytic degron in another case, exemplifies its practical application. DEGRONOPEDIA can be freely accessed at degronopedia.com.
Collapse
Affiliation(s)
- Natalia A Szulc
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Filip Stefaniak
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Małgorzata Piechota
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Anna Soszyńska
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Gabriela Piórkowska
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Andrea Cappannini
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Chiara Maniaci
- Medical Research Council (MRC) Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | - Wojciech Pokrzywa
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| |
Collapse
|
35
|
Nguyen AK, Blacksmith MS, Kidd JM. Duplications and Retrogenes Are Numerous and Widespread in Modern Canine Genomic Assemblies. Genome Biol Evol 2024; 16:evae142. [PMID: 38946312 PMCID: PMC11259980 DOI: 10.1093/gbe/evae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/08/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024] Open
Abstract
Recent years have seen a dramatic increase in the number of canine genome assemblies available. Duplications are an important source of evolutionary novelty and are also prone to misassembly. We explored the duplication content of nine canine genome assemblies using both genome self-alignment and read-depth approaches. We find that 8.58% of the genome is duplicated in the canFam4 assembly, derived from the German Shepherd Dog Mischka, including 90.15% of unplaced contigs. Highlighting the continued difficulty in properly assembling duplications, less than half of read-depth and assembly alignment duplications overlap, but the mCanLor1.2 Greenland wolf assembly shows greater concordance. Further study shows the presence of multiple segments that have alignments to four or more duplicate copies. These high-recurrence duplications correspond to gene retrocopies. We identified 3,892 candidate retrocopies from 1,316 parental genes in the canFam4 assembly and find that ∼8.82% of duplicated base pairs involve a retrocopy, confirming this mechanism as a major driver of gene duplication in canines. Similar patterns are found across eight other recent canine genome assemblies, with metrics supporting a greater quality of the PacBio HiFi mCanLor1.2 assembly. Comparison between the wolf and other canine assemblies found that 92% of retrocopy insertions are shared between assemblies. By calculating the number of generations since genome divergence, we estimate that new retrocopy insertions appear, on average, in 1 out of 3,514 births. Our analyses illustrate the impact of retrogene formation on canine genomes and highlight the variable representation of duplicated sequences among recently completed canine assemblies.
Collapse
Affiliation(s)
- Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew S Blacksmith
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
36
|
Hong L, Kortemme T. An integrative approach to protein sequence design through multiobjective optimization. PLoS Comput Biol 2024; 20:e1011953. [PMID: 38991035 PMCID: PMC11265717 DOI: 10.1371/journal.pcbi.1011953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 07/23/2024] [Accepted: 06/25/2024] [Indexed: 07/13/2024] Open
Abstract
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Collapse
Affiliation(s)
- Lu Hong
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
- Quantitative Biosciences Institute, University of California, San Francisco, California, United States of America
- Chan Zuckerberg Biohub, San Francisco, California, United States of America
| |
Collapse
|
37
|
Lindeboom RGH, Worlock KB, Dratva LM, Yoshida M, Scobie D, Wagstaffe HR, Richardson L, Wilbrey-Clark A, Barnes JL, Kretschmer L, Polanski K, Allen-Hyttinen J, Mehta P, Sumanaweera D, Boccacino JM, Sungnak W, Elmentaite R, Huang N, Mamanova L, Kapuge R, Bolt L, Prigmore E, Killingley B, Kalinova M, Mayer M, Boyers A, Mann A, Swadling L, Woodall MNJ, Ellis S, Smith CM, Teixeira VH, Janes SM, Chambers RC, Haniffa M, Catchpole A, Heyderman R, Noursadeghi M, Chain B, Mayer A, Meyer KB, Chiu C, Nikolić MZ, Teichmann SA. Human SARS-CoV-2 challenge uncovers local and systemic response dynamics. Nature 2024; 631:189-198. [PMID: 38898278 PMCID: PMC11222146 DOI: 10.1038/s41586-024-07575-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/16/2024] [Indexed: 06/21/2024]
Abstract
The COVID-19 pandemic is an ongoing global health threat, yet our understanding of the dynamics of early cellular responses to this disease remains limited1. Here in our SARS-CoV-2 human challenge study, we used single-cell multi-omics profiling of nasopharyngeal swabs and blood to temporally resolve abortive, transient and sustained infections in seronegative individuals challenged with pre-Alpha SARS-CoV-2. Our analyses revealed rapid changes in cell-type proportions and dozens of highly dynamic cellular response states in epithelial and immune cells associated with specific time points and infection status. We observed that the interferon response in blood preceded the nasopharyngeal response. Moreover, nasopharyngeal immune infiltration occurred early in samples from individuals with only transient infection and later in samples from individuals with sustained infection. High expression of HLA-DQA2 before inoculation was associated with preventing sustained infection. Ciliated cells showed multiple immune responses and were most permissive for viral replication, whereas nasopharyngeal T cells and macrophages were infected non-productively. We resolved 54 T cell states, including acutely activated T cells that clonally expanded while carrying convergent SARS-CoV-2 motifs. Our new computational pipeline Cell2TCR identifies activated antigen-responding T cells based on a gene expression signature and clusters these into clonotype groups and motifs. Overall, our detailed time series data can serve as a Rosetta stone for epithelial and immune cell responses and reveals early dynamic responses associated with protection against infection.
Collapse
Affiliation(s)
- Rik G H Lindeboom
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- The Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Kaylee B Worlock
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Lisa M Dratva
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Wellcome MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Masahiro Yoshida
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - David Scobie
- Research Department of Infection, Division of Infection and Immunity, University College London, London, UK
| | - Helen R Wagstaffe
- Department of Infectious Disease, Imperial College London, London, UK
| | - Laura Richardson
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Josephine L Barnes
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | | | | | | | - Puja Mehta
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | | | | | - Waradon Sungnak
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Microbiology, Faculty of Science, and Integrative Computational BioScience Center, Mahidol University, Bangkok, Thailand
| | - Rasa Elmentaite
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Ensocell Therapeutics, BioData Innovation Centre, Wellcome Genome Campus, Hinxton, UK
| | - Ni Huang
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Lira Mamanova
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Rakesh Kapuge
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Liam Bolt
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Elena Prigmore
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Ben Killingley
- Department of Infectious Diseases, University College London Hospital, London, UK
| | | | | | | | | | - Leo Swadling
- Division of Infection and Immunity, Institute of Immunity and Transplantation, University College London, London, UK
| | | | - Samuel Ellis
- UCL Great Ormond Street Institute of Child Health, London, UK
| | - Claire M Smith
- UCL Great Ormond Street Institute of Child Health, London, UK
| | - Vitor H Teixeira
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Sam M Janes
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Rachel C Chambers
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Robert Heyderman
- Research Department of Infection, Division of Infection and Immunity, University College London, London, UK
| | - Mahdad Noursadeghi
- Research Department of Infection, Division of Infection and Immunity, University College London, London, UK
| | - Benny Chain
- Research Department of Infection, Division of Infection and Immunity, University College London, London, UK
| | - Andreas Mayer
- Research Department of Infection, Division of Infection and Immunity, University College London, London, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Christopher Chiu
- Department of Infectious Disease, Imperial College London, London, UK
| | - Marko Z Nikolić
- UCL Respiratory, Division of Medicine, University College London, London, UK.
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK.
- Wellcome MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
38
|
Xu T, Wang Q, Yang Z, Ying J. A BERT-based approach for identifying anti-inflammatory peptides using sequence information. Heliyon 2024; 10:e32951. [PMID: 38988537 PMCID: PMC11234020 DOI: 10.1016/j.heliyon.2024.e32951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 05/22/2024] [Indexed: 07/12/2024] Open
Abstract
The use of anti-inflammatory peptides (AIPs) as an alternative therapeutic approach for inflammatory diseases holds great research significance. Due to the high cost and difficulty in identifying AIPs with experimental methods, the discovery and design of peptides by computational methods before the experimental stage have become promising technology. In this study, we present BertAIP, a bidirectional encoder representation from transformers (BERT)-based method for predicting AIPs directly from their amino acid sequence without using any other information. BertAIP implements a BERT model to extract features of a protein, and uses a fully connected feed-forward network for AIP classification. It was constructed and evaluated using the AIP datasets that were reconstructed from the latest Immune Epitope Database. The experimental results showed that BertAIP achieved an accuracy of 0.751 and a Matthews correlation coefficient of 0.451, which were higher than other commonly used methods. The results of the independent test suggested that BertAIP outperformed the existing AIP predictors. In addition, to enhance the interpretability of BertAIP, we explored and visualized the amino acids that the model considered important for AIP prediction. We believe that the BertAIP proposed herein will be a useful tool for large-scale screening and identifying novel AIPs for drug development and therapeutic research related to inflammatory diseases.
Collapse
Affiliation(s)
- Teng Xu
- Institute of Translational Medicine, Baotou Central Hospital, Baotou, China
| | - Qian Wang
- Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, China
| | - Zhigang Yang
- Institute of Translational Medicine, Baotou Central Hospital, Baotou, China
| | - Jianchao Ying
- Wenzhou Key Laboratory of Emergency, Critical Care, and Disaster Medicine, Department of Emergency, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
39
|
Chiu R, Rajan-Babu IS, Friedman JM, Birol I. A comprehensive tandem repeat catalog of the human genome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.19.24309173. [PMID: 38947075 PMCID: PMC11213036 DOI: 10.1101/2024.06.19.24309173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
- BC Children's Hospital Research Institute, Vancouver, BC V5Z 4H4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
40
|
Eggers AR, Chen K, Soczek KM, Tuck OT, Doherty EE, Xu B, Trinidad MI, Thornton BW, Yoon PH, Doudna JA. Rapid DNA unwinding accelerates genome editing by engineered CRISPR-Cas9. Cell 2024; 187:3249-3261.e14. [PMID: 38781968 PMCID: PMC11658890 DOI: 10.1016/j.cell.2024.04.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/20/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024]
Abstract
Thermostable clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas9) enzymes could improve genome-editing efficiency and delivery due to extended protein lifetimes. However, initial experimentation demonstrated Geobacillus stearothermophilus Cas9 (GeoCas9) to be virtually inactive when used in cultured human cells. Laboratory-evolved variants of GeoCas9 overcome this natural limitation by acquiring mutations in the wedge (WED) domain that produce >100-fold-higher genome-editing levels. Cryoelectron microscopy (cryo-EM) structures of the wild-type and improved GeoCas9 (iGeoCas9) enzymes reveal extended contacts between the WED domain of iGeoCas9 and DNA substrates. Biochemical analysis shows that iGeoCas9 accelerates DNA unwinding to capture substrates under the magnesium-restricted conditions typical of mammalian but not bacterial cells. These findings enabled rational engineering of other Cas9 orthologs to enhance genome-editing levels, pointing to a general strategy for editing enzyme improvement. Together, these results uncover a new role for the Cas9 WED domain in DNA unwinding and demonstrate how accelerated target unwinding dramatically improves Cas9-induced genome-editing activity.
Collapse
Affiliation(s)
- Amy R Eggers
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Kai Chen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Katarzyna M Soczek
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Owen T Tuck
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Erin E Doherty
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Bryant Xu
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Marena I Trinidad
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Brittney W Thornton
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Peter H Yoon
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jennifer A Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA; Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA 94720, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA; Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
41
|
Bai G, Zeng X, Zhang L, Wang Y, Ma B. Computational investigation of the inhibitory interaction of IRF3 and SARS-CoV-2 accessory protein ORF3b. Biochem Biophys Res Commun 2024; 712-713:149945. [PMID: 38640732 DOI: 10.1016/j.bbrc.2024.149945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 04/14/2024] [Indexed: 04/21/2024]
Abstract
ORF3b is one of the SARS-CoV-2 accessory proteins. Previous experimental study suggested that ORF3b prevents IRF3 translocating to nucleus. However, the biophysical mechanism of ORF3b-IRF3 interaction is elusive. Here, we explored the conformation ensemble of ORF3b using all-atom replica exchange molecular dynamics simulation. Disordered ORF3b has mixed α-helix, β-turn and loop conformers. The potential ORF3b-IRF3 binding modes were searched by docking representative ORF3b conformers with IRF3, and 50 ORF3b-IRF3 complex poses were screened using molecular dynamics simulations ranging from 500 to 1000 ns. We found that ORF3b binds IRF3 predominantly on its CBP binding and phosphorylated pLxIS motifs, with CBP binding site has the highest binding affinity. The ORF3b-IRF3 binding residues are highly conserved in SARS-CoV-2. Our results provided biophysics insights into ORF3b-IRF3 interaction and explained its interferon antagonism mechanism.
Collapse
Affiliation(s)
- Ganggang Bai
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xincheng Zeng
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Linghao Zhang
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Buyong Ma
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
42
|
Eliad B, Schneider N, Zgayer OBN, Amichan Y, Glaser F, Erdmann EA, Rajendren S, Hundley HA, Lamm AT. ADBP-1 regulates ADR-2 nuclear localization to control editing substrate selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.14.540679. [PMID: 38895382 PMCID: PMC11185548 DOI: 10.1101/2023.05.14.540679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is a prevalent and conserved RNA modification. While A-to-I RNA editing is essential in mammals, in Caenorhabditis elegans , it is not, making them invaluable for RNA editing research. In C. elegans , ADR-2 is the sole catalytic A-to-I editing enzyme, and ADR-1 is an RNA editing regulator. ADAR localization is well-studied in humans but not well-established in C. elegans . In this study, we examine the cellular and tissue-specific localization of ADR-2. We show that while ADR-2 is present in most cells in the embryo, at later developmental stages, its expression is both tissue- and cell-type-specific. Additionally, both ADARs are mainly in the nucleus. ADR-2 is adjacent to the chromosomes during the cell cycle. We show that the nuclear localization of endogenous ADR-2 depends on ADBP-1, not ADR-1. In adbp-1 mutant worms, ADR-2 is mislocalized, while ADR-1 is not, leading to decreased editing levels and de-novo editing, mostly in exons, suggesting that ADR-2 is also functional in the cytoplasm. Besides, mutated ADBP-1 affects gene expression. Furthermore, we show that ADR-2 targets adenosines with different surrounding nucleotides in exons and introns. Our findings indicate that ADR-2 cellular localization is highly regulated and affects its function.
Collapse
|
43
|
Tóth AD, Soltész-Katona E, Kis K, Guti V, Gilzer S, Prokop S, Boros R, Misák Á, Balla A, Várnai P, Turiák L, Ács A, Drahos L, Inoue A, Hunyady L, Turu G. ArreSTick motif controls β-arrestin-binding stability and extends phosphorylation-dependent β-arrestin interactions to non-receptor proteins. Cell Rep 2024; 43:114241. [PMID: 38758647 DOI: 10.1016/j.celrep.2024.114241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/11/2024] [Accepted: 05/01/2024] [Indexed: 05/19/2024] Open
Abstract
The binding and function of β-arrestins are regulated by specific phosphorylation motifs present in G protein-coupled receptors (GPCRs). However, the exact arrangement of phosphorylated amino acids responsible for establishing a stable interaction remains unclear. We employ a 1D sequence convolution model trained on GPCRs with established β-arrestin-binding properties. With this approach, amino acid motifs characteristic of GPCRs that form stable interactions with β-arrestins can be identified, a pattern that we name "arreSTick." Intriguingly, the arreSTick pattern is also present in numerous non-receptor proteins. Using proximity biotinylation assay and mass spectrometry analysis, we demonstrate that the arreSTick motif controls the interaction between many non-receptor proteins and β-arrestin2. The HIV-1 Tat-specific factor 1 (HTSF1 or HTATSF1), a nuclear transcription factor, contains the arreSTick pattern, and its subcellular localization is influenced by β-arrestin2. Our findings unveil a broader role for β-arrestins in phosphorylation-dependent interactions, extending beyond GPCRs to encompass non-receptor proteins as well.
Collapse
Affiliation(s)
- András Dávid Tóth
- Institute of Molecular Life Sciences, Centre of Excellence of the Hungarian Academy of Sciences, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary; Department of Internal Medicine and Haematology, Semmelweis University, Szentkirályi street 46, 1088 Budapest, Hungary
| | - Eszter Soltész-Katona
- Institute of Molecular Life Sciences, Centre of Excellence of the Hungarian Academy of Sciences, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary; Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Katalin Kis
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Viktor Guti
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Sharon Gilzer
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Susanne Prokop
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Roxána Boros
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - Ádám Misák
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary
| | - András Balla
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary; HUN-REN SE Hungarian Research Network Laboratory of Molecular Physiology, Budapest, Hungary
| | - Péter Várnai
- Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary; HUN-REN SE Hungarian Research Network Laboratory of Molecular Physiology, Budapest, Hungary
| | - Lilla Turiák
- Institute of Organic Chemistry, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary
| | - András Ács
- Institute of Organic Chemistry, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary
| | - László Drahos
- Institute of Organic Chemistry, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary
| | - Asuka Inoue
- Molecular and Cellular Biochemistry, Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Japan
| | - László Hunyady
- Institute of Molecular Life Sciences, Centre of Excellence of the Hungarian Academy of Sciences, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary; Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary.
| | - Gábor Turu
- Institute of Molecular Life Sciences, Centre of Excellence of the Hungarian Academy of Sciences, HUN-REN Research Centre for Natural Sciences, Magyar Tudósok krt. 2., 1117 Budapest, Hungary; Department of Physiology, Semmelweis University, Tűzoltó street 37-47, 1094 Budapest, Hungary.
| |
Collapse
|
44
|
Tambe A, MacCarthy T, Pavri R. Interpretable deep learning reveals the role of an E-box motif in suppressing somatic hypermutation of AGCT motifs within human immunoglobulin variable regions. Front Immunol 2024; 15:1407470. [PMID: 38863710 PMCID: PMC11165027 DOI: 10.3389/fimmu.2024.1407470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 05/08/2024] [Indexed: 06/13/2024] Open
Abstract
Introduction Somatic hypermutation (SHM) of immunoglobulin variable (V) regions by activation induced deaminase (AID) is essential for robust, long-term humoral immunity against pathogen and vaccine antigens. AID mutates cytosines preferentially within WRCH motifs (where W=A or T, R=A or G and H=A, C or T). However, it has been consistently observed that the mutability of WRCH motifs varies substantially, with large variations in mutation frequency even between multiple occurrences of the same motif within a single V region. This has led to the notion that the immediate sequence context of WRCH motifs contributes to mutability. Recent studies have highlighted the potential role of local DNA sequence features in promoting mutagenesis of AGCT, a commonly mutated WRCH motif. Intriguingly, AGCT motifs closer to 5' ends of V regions, within the framework 1 (FW1) sub-region1, mutate less frequently, suggesting an SHM-suppressing sequence context. Methods Here, we systematically examined the basis of AGCT positional biases in human SHM datasets with DeepSHM, a machine-learning model designed to predict SHM patterns. This was combined with integrated gradients, an interpretability method, to interrogate the basis of DeepSHM predictions. Results DeepSHM predicted the observed positional differences in mutation frequencies at AGCT motifs with high accuracy. For the conserved, lowly mutating AGCT motifs in FW1, integrated gradients predicted a large negative contribution of 5'C and 3'G flanking residues, suggesting that a CAGCTG context in this location was suppressive for SHM. CAGCTG is the recognition motif for E-box transcription factors, including E2A, which has been implicated in SHM. Indeed, we found a strong, inverse relationship between E-box motif fidelity and mutation frequency. Moreover, E2A was found to associate with the V region locale in two human B cell lines. Finally, analysis of human SHM datasets revealed that naturally occurring mutations in the 3'G flanking residues, which effectively ablate the E-box motif, were associated with a significantly increased rate of AGCT mutation. Discussion Our results suggest an antagonistic relationship between mutation frequency and the binding of E-box factors like E2A at specific AGCT motif contexts and, therefore, highlight a new, suppressive mechanism regulating local SHM patterns in human V regions.
Collapse
Affiliation(s)
- Abhik Tambe
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, United States
| | - Thomas MacCarthy
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
| | - Rushad Pavri
- Research Institute of Molecular Pathology (IMP), Vienna, Austria
- Peter Gorer Department of Immunobiology, School of Immunology & Microbial Sciences, King’s College London, London, United Kingdom
| |
Collapse
|
45
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.594371. [PMID: 38826350 PMCID: PMC11142182 DOI: 10.1101/2024.05.23.594371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México 14610, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México
| | - Víctor H. Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA 02215
| |
Collapse
|
46
|
Adams C, Gabriel W, Laukens K, Picciani M, Wilhelm M, Bittremieux W, Boonen K. Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in timsTOF. Nat Commun 2024; 15:3956. [PMID: 38730277 PMCID: PMC11087512 DOI: 10.1038/s41467-024-48322-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open
Abstract
Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.
Collapse
Affiliation(s)
- Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium.
| | - Kurt Boonen
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
- Sustainable Health Department, Flemish Institute for Technological Research (VITO), Antwerp, Belgium.
| |
Collapse
|
47
|
Gizzio J, Thakur A, Haldane A, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. RESEARCH SQUARE 2024:rs.3.rs-4048991. [PMID: 38746330 PMCID: PMC11092858 DOI: 10.21203/rs.3.rs-4048991/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory "folded" conformation, due to intrinsic sequence effects. Here we investigated the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a novel thermodynamic cycle involving many (n=108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation "DFG-out Activation Loop Folded", is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
48
|
Gizzio J, Thakur A, Haldane A, Post CB, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.584161. [PMID: 38559238 PMCID: PMC10979876 DOI: 10.1101/2024.03.08.584161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory "folded" conformation, due to intrinsic sequence effects. Here we investigated the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a novel thermodynamic cycle involving many (n=108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation "DFG-out Activation Loop Folded", is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Carol Beth Post
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana 47907
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
49
|
Prostova M, Kanevskaya A, Panteleev V, Lisitskaya L, Perfilova Tugaeva KV, Sluchanko NN, Esyunina D, Kulbachinskiy A. DNA-targeting short Argonautes complex with effector proteins for collateral nuclease activity and bacterial population immunity. Nat Microbiol 2024; 9:1368-1381. [PMID: 38622379 DOI: 10.1038/s41564-024-01654-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 02/28/2024] [Indexed: 04/17/2024]
Abstract
Two prokaryotic defence systems, prokaryotic Argonautes (pAgos) and CRISPR-Cas, detect and cleave invader nucleic acids using complementary guides and the nuclease activities of pAgo or Cas proteins. However, not all pAgos are active nucleases. A large clade of short pAgos bind nucleic acid guides but lack nuclease activity, suggesting a different mechanism of action. Here we investigate short pAgos associated with a putative effector nuclease, NbaAgo from Novosphingopyxis baekryungensis and CmeAgo from Cupriavidus metallidurans. We show that these pAgos form a heterodimeric complex with co-encoded effector nucleases (short prokaryotic Argonaute, DNase and RNase associated (SPARDA)). RNA-guided target DNA recognition unleashes the nuclease activity of SPARDA leading to indiscriminate collateral cleavage of DNA and RNA. Activation of SPARDA by plasmids or phages results in degradation of cellular DNA and cell death or dormancy, conferring target-specific population protection and expanding the range of known prokaryotic immune systems.
Collapse
Affiliation(s)
- Maria Prostova
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia.
| | - Anna Kanevskaya
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia.
| | | | - Lidia Lisitskaya
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - Kristina V Perfilova Tugaeva
- A.N. Bach Institute of Biochemistry, Federal Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Nikolai N Sluchanko
- A.N. Bach Institute of Biochemistry, Federal Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Daria Esyunina
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | | |
Collapse
|
50
|
Dudnyk K, Cai D, Shi C, Xu J, Zhou J. Sequence basis of transcription initiation in the human genome. Science 2024; 384:eadj0116. [PMID: 38662817 PMCID: PMC11223672 DOI: 10.1126/science.adj0116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 02/28/2024] [Indexed: 05/03/2024]
Abstract
Transcription initiation is a process that is essential to ensuring the proper function of any gene, yet we still lack a unified understanding of sequence patterns and rules that explain most transcription start sites in the human genome. By predicting transcription initiation at base-pair resolution from sequences with a deep learning-inspired explainable model called Puffin, we show that a small set of simple rules can explain transcription initiation at most human promoters. We identify key sequence patterns that contribute to human promoter activity, each activating transcription with distinct position-specific effects. Furthermore, we explain the sequence basis of bidirectional transcription at promoters, identify the links between promoter sequence and gene expression variation across cell types, and explore the conservation of sequence determinants of transcription initiation across mammalian species.
Collapse
Affiliation(s)
- Kseniia Dudnyk
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America
| | - Donghong Cai
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America
- Center of Excellence for Leukemia Studies (CELS), Department of Pathology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Chenlai Shi
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America
| | - Jian Xu
- Center of Excellence for Leukemia Studies (CELS), Department of Pathology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America
| |
Collapse
|