1
|
Osnaya VG, Gómez-Romero L, Moreno-Hagelsieb G, Hernández G. AUGcontext DB: a comprehensive catalog of the mRNA AUG initiator codon context across eukaryotes. RNA Biol 2025; 22:1-5. [PMID: 39936323 PMCID: PMC11834415 DOI: 10.1080/15476286.2025.2465196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 02/05/2025] [Accepted: 02/06/2025] [Indexed: 02/13/2025] Open
Abstract
The mRNA translation defines the composition of the cell proteome in all forms of life and diseases. In this process, precise selection of the mRNA translation initiation site (TIS) is crucial, as it establishes the correct open reading frame for triplet decoding. We have gathered and curated all published TIS consensus context sequences. We also included the TIS consensus context from novel 538 fungal genomes available from NCBI's RefSeq database. To do so, we wrote ad hoc programs in PERL to find and extract the TIS for each annotated gene, plus ten bases upstream and three downstream. For each genome, the sequences around the TIS of each gene were obtained, and the consensus was further calculated according to the Cavener rules and by the LOGOS algorithm. We created AUGcontext DB, a portal with a comprehensive collection of TIS context sequences across eukaryotes in a range from -10 to + 6. The compilation covers species of 30 vertebrates, 17 invertebrates, 25 plants, 14 fungi, and 11 protists studied in silico; 23 experimental studies; data on biotechnology; and the discovery of 8 diseases associated with specific mutations. Additionally, TIS context sequences of cellular IRESs were included. AUGcontext DB belongs to the National Institute of Cancer (Instituto Nacional de Cancerología, INCan), Mexico, and is freely available at http://108.161.138.77:8096/. Our catalogue allows us to do comparative studies between species, may help improve the diagnosis of certain diseases, and will be key to maximize the production of recombinant proteins.
Collapse
Affiliation(s)
- Vincent G. Osnaya
- mRNA and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (INCan), Mexico City, Mexico
| | - Laura Gómez-Romero
- Bioinformatics Department, National Institute of Genomic Medicine, Mexico City, Mexico
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Mexico City, Mexico
| | | | - Greco Hernández
- mRNA and Cancer Laboratory, Unit of Biomedical Research on Cancer, National Institute of Cancer (INCan), Mexico City, Mexico
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Mexico City, Mexico
| |
Collapse
|
2
|
Ferreira JR, Xu R, Hensel Z. Mycobacterium tuberculosis FtsB and PerM interact via a C-terminal helix in FtsB to modulate cell division. J Bacteriol 2025; 207:e0044424. [PMID: 40135878 DOI: 10.1128/jb.00444-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 12/03/2024] [Indexed: 03/27/2025] Open
Abstract
Latent infection by Mycobacterium tuberculosis (Mtb) impedes effective tuberculosis therapy and eradication. The protein PerM is essential for chronic Mtb infections in mice and acts via the divisome protein FtsB to modulate cell division. Using transgenic co-expression in Escherichia coli, we studied the Mtb PerM-FtsB interaction in isolation from other Mtb proteins, engineering PerM to enhance expression in the E. coli membrane. Using fluorescence microscopy in E. coli, we observed that the previously reported PerM-dependent instability of Mtb FtsB required a segment of FtsB predicted to bind cell-division proteins FtsL and FtsQ. Furthermore, we found that the stability of membrane-localized PerM hinged on its interaction with a conserved, C-terminal helix in FtsB. We also observed that removing this helix disrupted PerM-FtsB interaction using single-molecule tracking. Molecular dynamics results supported the observation that FtsB stabilized PerM and suggested that interactions at the PerM-FtsB interface differ from our initial structure prediction in a way that is consistent with PerM sequence conservation. Although narrowly conserved, the PerM-FtsB interaction emerges as a potential therapeutic target for persistent infections by disrupting the regulation of cell division. Integrating protein structure prediction, molecular dynamics, and single-molecule microscopy, our approach is primed to screen potential inhibitors of the PerM-FtsB interaction and can be straightforwardly adapted to explore other putative interactions.IMPORTANCEOur research reveals significant insights into the dynamic interaction between the proteins PerM and FtsB within Mycobacterium tuberculosis, contributing to our understanding of bacterial cell division mechanisms crucial for infection persistence. By combining innovative fluorescence microscopy and molecular dynamics, we established that the stability of these proteins is interdependent; molecular dynamics placing PerM-FtsB in the context of the mycobacterial divisome shows how disrupting PerM-FtsB interactions can plausibly impact bacterial cell wall synthesis. These findings highlight the PerM-FtsB interface as a promising target for novel therapeutics aimed at combating persistent bacterial infections. Importantly, our approach can be adapted for similar studies in other bacterial systems, suggesting broad implications for microbial biology and antibiotic development.
Collapse
Affiliation(s)
| | - Ruilan Xu
- ITQB NOVA, Universidade Nova de Lisboa, Avenida da República, Lisbon, Portugal
| | - Zach Hensel
- ITQB NOVA, Universidade Nova de Lisboa, Avenida da República, Lisbon, Portugal
| |
Collapse
|
3
|
Willems P, Thery F, Van Moortel L, De Meyer M, Staes A, Gul A, Kovalchuke L, Declercq A, Devreese R, Bouwmeester R, Gabriels R, Martens L, Impens F. Maximizing Immunopeptidomics-Based Bacterial Epitope Discovery by Multiple Search Engines and Rescoring. J Proteome Res 2025; 24:2141-2151. [PMID: 40080147 PMCID: PMC11976845 DOI: 10.1021/acs.jproteome.4c00864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 02/12/2025] [Accepted: 02/26/2025] [Indexed: 03/15/2025]
Abstract
Mass spectrometry-based discovery of bacterial immunopeptides presented by infected cells allows untargeted discovery of bacterial antigens that can serve as vaccine candidates. However, reliable identification of bacterial epitopes is challenged by their extremely low abundance. Here, we describe an optimized bioinformatic framework to enhance the confident identification of bacterial immunopeptides. Immunopeptidomics data of cell cultures infected with Listeria monocytogenes were searched by four different search engines, PEAKS, Comet, Sage and MSFragger, followed by data-driven rescoring with MS2Rescore. Compared with individual search engine results, this integrated workflow boosted immunopeptide identification by an average of 27% and led to the high-confidence detection of 18 additional bacterial peptides (+27%) matching 15 different Listeria proteins (+36%). Despite the strong agreement between the search engines, a small number of spectra (<1%) had ambiguous matches to multiple peptides and were excluded to ensure high-confidence identifications. Finally, we demonstrate our workflow with sensitive timsTOF SCP data acquisition and find that rescoring, now with inclusion of ion mobility features, identifies 76% more peptides compared to Q Exactive HF acquisition. Together, our results demonstrate how integration of multiple search engine results along with data-driven rescoring maximizes immunopeptide identification, boosting the detection of high-confidence bacterial epitopes for vaccine development.
Collapse
Affiliation(s)
- Patrick Willems
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB-UGent
Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department
of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Fabien Thery
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Laura Van Moortel
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Margaux De Meyer
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - An Staes
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB
Proteomics Core, VIB, 9052 Ghent, Belgium
| | - Adillah Gul
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lyudmila Kovalchuke
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Arthur Declercq
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbe Devreese
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of
Strasbourg, CNRS, ProFI FR2048, Strasbourg, France
| | - Francis Impens
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB-UGent
Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
- VIB
Proteomics Core, VIB, 9052 Ghent, Belgium
| |
Collapse
|
4
|
Behvarmanesh A, Kozlov G, Wagner JP, Chen YS, Gehring K. Deep Mutational Scanning of an Engineered High-affinity Ligand of the poly(A) Binding Protein MLLE Domain. J Mol Biol 2025; 437:169120. [PMID: 40180125 DOI: 10.1016/j.jmb.2025.169120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/05/2025]
Abstract
The MLLE domain is a peptide-binding domain found in the poly(A) binding protein (PABP) and the ubiquitin protein E3 ligase N-recognin 5 (UBR5) that recognizes a conserved motif, named PABP-interacting motif 2 (PAM2). The majority of PAM2 sequences bind to MLLE domains with low-micromolar affinity. Here, we designed a chimeric PAM2 peptide termed super PAM2 (sPAM2) by combining classical and trinucleotide repeat-containing 6 (TNRC6)-like binding modes to create a superior binder for the MLLE domain. The crystal structure of the PABPC1 MLLE-sPAM2 complex shows a crucial role of conserved sPAM2 leucine, phenylalanine and tryptophan residues in the interaction. We used deep mutational scanning (DMS) coupled with isothermal titration calorimetry (ITC) to characterize the specificity profiles for PABPC1 and UBR5 MLLE. The best sPAM2 sequence binds to PABPC1 MLLE with low-nanomolar affinity and nearly 20-fold more tightly than the best natural PAM2 sequence. This suggests that the affinities of natural PAM2 sequences are tuned to control their binding to PABPC1 and UBR5. Our study will aid in the discovery of new PAM2-containing proteins (PACs) and facilitate in vivo studies of PAM2-mediated cellular pathways.
Collapse
Affiliation(s)
- Ali Behvarmanesh
- Department of Biochemistry, McGill University, Montréal, Québec H3G 0B1, Canada; Centre de Recherche en Biologie Structurale, McGill University, Montréal, Québec H3G 0B1, Canada
| | - Guennadi Kozlov
- Department of Biochemistry, McGill University, Montréal, Québec H3G 0B1, Canada; Centre de Recherche en Biologie Structurale, McGill University, Montréal, Québec H3G 0B1, Canada
| | - Julian P Wagner
- Department of Biochemistry, McGill University, Montréal, Québec H3G 0B1, Canada; Centre de Recherche en Biologie Structurale, McGill University, Montréal, Québec H3G 0B1, Canada
| | - Yu Seby Chen
- Department of Biochemistry, McGill University, Montréal, Québec H3G 0B1, Canada; Centre de Recherche en Biologie Structurale, McGill University, Montréal, Québec H3G 0B1, Canada
| | - Kalle Gehring
- Department of Biochemistry, McGill University, Montréal, Québec H3G 0B1, Canada; Centre de Recherche en Biologie Structurale, McGill University, Montréal, Québec H3G 0B1, Canada.
| |
Collapse
|
5
|
Sugihara Y, Kourelis J, Contreras MP, Pai H, Harant A, Selvaraj M, Toghani A, Martínez-Anaya C, Kamoun S. Helper NLR immune protein NRC3 evolved to evade inhibition by a cyst nematode virulence effector. PLoS Genet 2025; 21:e1011653. [PMID: 40202957 PMCID: PMC11981194 DOI: 10.1371/journal.pgen.1011653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 03/09/2025] [Indexed: 04/11/2025] Open
Abstract
Parasites can counteract host immunity by suppressing nucleotide binding and leucine-rich repeat (NLR) proteins that function as immune receptors. We previously showed that a cyst nematode virulence effector SPRYSEC15 (SS15) binds and inhibits oligomerisation of helper NLR proteins in the expanded NRC1/2/3 clade by preventing intramolecular rearrangements required for NRC oligomerisation into an activated resistosome. Here we examined the degree to which NRC proteins from multiple Solanaceae species are sensitive to suppression by SS15 and tested hypotheses about adaptive evolution of the binding interface between the SS15 inhibitor and NRC proteins. Whereas all tested orthologs of NRC2 were inhibited by SS15, some natural variants of NRC1 and NRC3 are insensitive to SS15 suppression. Ancestral sequence reconstruction combined with functional assays revealed that NRC3 transitioned from an ancestral suppressed form to an insensitive one over 19 million years ago. Our analyses revealed the evolutionary trajectory of an NLR immune receptor against a parasite inhibitor, identifying key evolutionary transitions in helper NLRs that counteract this inhibition. This work reveals a distinct type of gene-for-gene interaction between parasite or pathogen immunosuppressors and host immune receptors that contrasts with the coevolution between AVR effectors and immune receptors.
Collapse
Affiliation(s)
- Yu Sugihara
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | - Jiorgos Kourelis
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | | | - Hsuan Pai
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | - Adeline Harant
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | - Muniyandi Selvaraj
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | - AmirAli Toghani
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| | - Claudia Martínez-Anaya
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Sophien Kamoun
- The Sainsbury Laboratory, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
6
|
Tordoff J, Alfonse LE, Makarova KS, Ornstein A, Garrity AJ, Yan WX, Scott DA, Koonin EV, Cheng DR. Initial Characterization of 12 New Subtypes and Variants of Type V CRISPR Systems. CRISPR J 2025. [PMID: 40163416 DOI: 10.1089/crispr.2024.0100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2025] Open
Abstract
Type V CRISPR systems are highly diverse in sequence, mechanism, and function. Although recent efforts have greatly expanded our understanding of their evolution, the diversity of type V systems remains to be completely explored, and many clades have not been experimentally characterized. In this work, we mined metagenomic databases to identify three new subtypes and nine new variants of Cas12, the effector of Type V systems, and provide experimental and computational characterization of their Protospacer-Adjacent Motif (PAM), interference activity, loci architecture, and tracrRNA dependence. Half of the new Cas12s are found in phages or prophages. New subtypes Cas12o and Cas12p lack the canonical RuvC catalytic residues, suggesting they interfere with the target without cleavage, possibly by blocking transcription or replication. One variant, Cas12f10, displays substantial activity on PAM-less targets. Our work expands the diversity of the functionally characterized Cas12 effectors and provides some promising candidates for genome engineering tools.
Collapse
Affiliation(s)
- Jesse Tordoff
- Arbor Biotechnologies, Cambridge, Massachusetts, USA
| | | | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | - Winston X Yan
- Arbor Biotechnologies, Cambridge, Massachusetts, USA
| | - David A Scott
- Arbor Biotechnologies, Cambridge, Massachusetts, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - David R Cheng
- Arbor Biotechnologies, Cambridge, Massachusetts, USA
| |
Collapse
|
7
|
Wang SK, Li J, Nair S, Korasaju R, Chen Y, Zhang Y, Kundaje A, Liu Y, Wang N, Chang HY. Single-cell multiome and enhancer connectome of human retinal pigment epithelium and choroid nominate pathogenic variants in age-related macular degeneration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.21.644670. [PMID: 40196652 PMCID: PMC11974679 DOI: 10.1101/2025.03.21.644670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Age-related macular degeneration (AMD) is a leading cause of vision loss worldwide. Genome-wide association studies (GWAS) of AMD have identified dozens of risk loci that may house disease targets. However, variants at these loci are largely noncoding, making it difficult to assess their function and whether they are causal. Here, we present a single-cell gene expression and chromatin accessibility atlas of human retinal pigment epithelium (RPE) and choroid to systematically analyze both coding and noncoding variants implicated in AMD. We employ HiChIP and Activity-by-Contact modeling to map enhancers in these tissues and predict cell and gene targets of risk variants. We further perform allele-specific self-transcribing active regulatory region sequencing (STARR-seq) to functionally test variant activity in RPE cells, including in the context of complement activation. Our work nominates new pathogenic variants and mechanisms in AMD and offers a rich and accessible resource for studying diseases of the RPE and choroid.
Collapse
|
8
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. PLoS Comput Biol 2025; 21:e1012818. [PMID: 40111986 PMCID: PMC11957564 DOI: 10.1371/journal.pcbi.1012818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 01/22/2025] [Indexed: 03/22/2025] Open
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|
9
|
Tadros DM, Racle J, Gfeller D. Predicting MHC-I ligands across alleles and species: how far can we go? Genome Med 2025; 17:25. [PMID: 40114147 PMCID: PMC11927126 DOI: 10.1186/s13073-025-01450-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 03/10/2025] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND CD8+ T-cell activation is initiated by the recognition of epitopes presented on class I major histocompatibility complex (MHC-I) molecules. Identifying such epitopes is useful for molecular understanding of cellular immune responses and can guide the development of personalized vaccines for various diseases including cancer. For a few hundred common human and mouse MHC-I alleles, large datasets of ligands are available and machine learning MHC-I ligand predictors trained on such data reach high prediction accuracy. However, for the vast majority of other MHC-I alleles, no ligand is known. METHODS We capitalize on an expanded architecture of our MHC-I ligand predictor (MixMHCpred3.0) to systematically assess the extent to which predictions of MHC-I ligands can be applied to MHC-I alleles that currently lack known ligand data. RESULTS Our results reveal high prediction accuracy for most MHC-I alleles in human and in laboratory mouse strains, but significantly lower accuracy in other species. Our work further outlines some of the molecular determinants of MHC-I ligand prediction accuracy across alleles and species. Robust benchmarking on external data shows that our MHC-I ligand predictor demonstrates competitive performance relative to other state-of-the-art MHC-I ligand predictors and can be used for CD8+ T-cell epitope predictions. CONCLUSIONS Our work provides a valuable tool for predicting antigen presentation across all human and mouse MHC-I alleles. MixMHCpred3.0 tool is available at https://github.com/GfellerLab/MixMHCpred .
Collapse
Affiliation(s)
- Daniel M Tadros
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Racle
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
| |
Collapse
|
10
|
Choppavarapu L, Fang K, Liu T, Ohihoin AG, Jin VX. Hi-C profiling in tissues reveals 3D chromatin-regulated breast tumor heterogeneity informing a looping-mediated therapeutic avenue. Cell Rep 2025; 44:115450. [PMID: 40112000 DOI: 10.1016/j.celrep.2025.115450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/12/2025] [Accepted: 02/28/2025] [Indexed: 03/22/2025] Open
Abstract
The limitations of Hi-C (high-throughput chromosome conformation capture) profiling in in vitro cell culture include failing to recapitulate disease-specific physiological properties and lacking a clinically relevant disease microenvironment. In this study, we conduct Hi-C profiling in a pilot cohort of 12 breast tissues comprising two normal tissues, five ER+ breast primary tumors, and five tamoxifen-treated recurrent tumors. We demonstrate 3D chromatin-regulated breast tumor heterogeneity and identify a looping-mediated target gene, CA2, which might play a role in driving tamoxifen resistance. The inhibition of CA2 impedes tumor growth both in vitro and in vivo and reverses chromatin looping. The disruption of CA2 looping reduces tamoxifen-resistant cancer cell proliferation, decreases CA2 mRNA and protein expression, and weakens the looping interaction. Our study thus provides mechanistic and functional insights into the role of 3D chromatin architecture in regulating breast tumor heterogeneity and informs a new looping-mediated therapeutic avenue for treating breast cancer.
Collapse
Affiliation(s)
- Lavanya Choppavarapu
- Divison of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Kun Fang
- Divison of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Tianxiang Liu
- Divison of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Aigbe G Ohihoin
- Cell and Developmental Biology PhD program, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Victor X Jin
- Divison of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| |
Collapse
|
11
|
Martí-Gómez C, Zhou J, Chen WC, Kinney JB, McCandlish DM. Inference and visualization of complex genotype-phenotype maps with gpmap-tools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642267. [PMID: 40161830 PMCID: PMC11952336 DOI: 10.1101/2025.03.09.642267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools ( https://github.com/cmarti/gpmap-tools ), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.
Collapse
|
12
|
Juarez MG, O'Rourke SM, Dzimianski JV, Gagnon D, Penunuri G, Serrão VHB, Corbett-Detig RB, Kauvar LM, DuBois RM. Structures of respiratory syncytial virus G bound to broadly reactive antibodies provide insights into vaccine design. Sci Rep 2025; 15:8666. [PMID: 40082629 PMCID: PMC11906780 DOI: 10.1038/s41598-025-92886-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 03/03/2025] [Indexed: 03/16/2025] Open
Abstract
Respiratory syncytial virus (RSV) is a leading cause of severe lower respiratory tract disease in infants and older adults. The attachment glycoprotein (RSV G) binds to the chemokine receptor CX3CR1 to promote viral entry and modulate host immunity. Antibodies against RSV G are a known correlate of protection. Previously, several broadly reactive, high-affinity anti-RSV G human monoclonal antibodies were isolated from RSV-exposed individuals and were shown to be protective in vitro and in vivo. Here, we determined the structures of three of these antibodies in complex with RSV G and defined distinct conformational epitopes comprised of highly conserved RSV G residues. Binding competition and structural studies demonstrated that this highly conserved region displays two non-overlapping antigenic sites. Analyses of anti-RSV G antibody sequences reveal that antigenic site flexibility may promote the elicitation of diverse antibody germlines. Together, these findings provide a foundation for next-generation RSV prophylactics, and they expand concepts in vaccine design for the elicitation of germline lineage-diverse, broadly reactive, high-affinity antibodies.
Collapse
Affiliation(s)
- Maria G Juarez
- Department of Molecular, Cell, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sara M O'Rourke
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - John V Dzimianski
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Delia Gagnon
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Gabriel Penunuri
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Vitor H B Serrão
- Department of Chemistry & Biochemistry, University of California Santa Cruz, Santa Cruz, CA, USA
- Biomolecular Cryo-Electron Microscopy Facility, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Russell B Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Rebecca M DuBois
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
13
|
Ireland KA, Kayrouz CM, Abbott ML, Seyedsayamdost MR, Davis KM. Structural and functional analysis of SAM-dependent N-methyltransferases involved in ovoselenol and ovothiol biosynthesis. Structure 2025; 33:528-538.e5. [PMID: 39862859 PMCID: PMC11890939 DOI: 10.1016/j.str.2024.12.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 12/12/2024] [Accepted: 12/27/2024] [Indexed: 01/27/2025]
Abstract
Thio/selenoimidazole Nπ-methyltransferases are an emerging family of enzymes catalyzing the final step in the production of the S/Se-containing histidine-derived antioxidants ovothiol and ovoselenol. These enzymes, prevalent in prokaryotes, show minimal sequence similarity to other methyltransferases, and the structural determinants of their reactivities remain poorly understood. Herein, we report ligand-bound crystal structures of OvsM from the ovoselenol pathway as well as a member of a previously unknown clade of standalone ovothiol-biosynthetic Nπ-methyltransferases, which we have designated OvoM. Unlike previously reported ovothiol methyltransferases, which are fused as a C-terminal domain to the sulfoxide synthase OvoA, OvoMs function independently. Comparative structural analyses reveal conserved, ligand-induced conformational changes, suggesting similar behavior in dual-domain OvoA enzymes. Mutagenesis supports a model where OvoA domain rearrangement facilitates substrate recognition via a critical Tyr residue in the domain linker. Biochemical studies identify an essential active-site Asp, likely serving as a catalytic base in the SN2-like nucleophilic substitution reaction.
Collapse
Affiliation(s)
- Kendra A Ireland
- Department of Chemistry, Emory University, Atlanta, GA 30322, USA
| | - Chase M Kayrouz
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Marissa L Abbott
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Mohammad R Seyedsayamdost
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA; Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | | |
Collapse
|
14
|
Jeon J, Yu S, Lee S, Kim SC, Jo HY, Jung I, Kim K. EpicPred: predicting phenotypes driven by epitope-binding TCRs using attention-based multiple instance learning. Bioinformatics 2025; 41:btaf080. [PMID: 39982404 PMCID: PMC11879650 DOI: 10.1093/bioinformatics/btaf080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 12/16/2024] [Accepted: 02/19/2025] [Indexed: 02/22/2025] Open
Abstract
MOTIVATION Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR-epitope interactions. EpicPred first predicts and removes unlikely TCR-epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR-epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients. RESULTS From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07. AVAILABILITY AND IMPLEMENTATION The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.
Collapse
Affiliation(s)
- Jaemin Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Suwan Yu
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Sangam Lee
- College of Computing, Yonsei University, Seoul 03722, Republic of Korea
| | - Sang Cheol Kim
- Division of Healthcare and Artificial Intelligence, Korea National Institute of Health, Cheongju 28159, Republic of Korea
| | - Hye-Yeong Jo
- Division of Healthcare and Artificial Intelligence, Korea National Institute of Health, Cheongju 28159, Republic of Korea
| | - Inuk Jung
- School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Kwangsoo Kim
- Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul 03080, Republic of Korea
- Department of Medicine, Seoul National University, Seoul 03080, Republic of Korea
| |
Collapse
|
15
|
Cormican JA, Medfai L, Wawrzyniuk M, Pašen M, Afrache H, Fourny C, Khan S, Gneiße P, Soh WT, Timelli A, Nolfi E, Pannekoek Y, Cope A, Urlaub H, Sijts AJAM, Mishto M, Liepe J. PEPSeek-Mediated Identification of Novel Epitopes From Viral and Bacterial Pathogens and the Impact on Host Cell Immunopeptidomes. Mol Cell Proteomics 2025; 24:100937. [PMID: 40044041 DOI: 10.1016/j.mcpro.2025.100937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 02/11/2025] [Accepted: 03/02/2025] [Indexed: 04/07/2025] Open
Abstract
Here, we develop PEPSeek, a web-server-based software to allow higher performance in the identification of pathogen-derived epitope candidates detected via mass spectrometry in MHC class I immunopeptidomes. We apply it to human and mouse cell lines infected with SARS-CoV-2, Listeria monocytogenes, or Chlamydia trachomatis, thereby identifying a large number of novel antigens and epitopes that we prove to be recognized by CD8+ T cells. In infected cells, we identified antigenic peptide features that suggested how the processing and presentation of pathogenic antigens differ between pathogens. The quantitative tools of PEPSeek also helped to define how C. trachomatis infection cycle could impact the antigenic landscape of the host human cell system, likely reflecting metabolic changes that occurred in the infected cells.
Collapse
Affiliation(s)
- John A Cormican
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Lobna Medfai
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Magdalena Wawrzyniuk
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Martin Pašen
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Hassnae Afrache
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom
| | - Constance Fourny
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom
| | - Sahil Khan
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Pascal Gneiße
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Georg-August University School of Science (GAUSS), University of Göttingen, Göttingen, Germany
| | - Wai Tuck Soh
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Arianna Timelli
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Emanuele Nolfi
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Yvonne Pannekoek
- Department of Medical Microbiology and Infection Prevention, Amsterdam UMC Location University of Amsterdam, Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands
| | - Andrew Cope
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Centre for Rheumatic Diseases, King's College London, London, UK
| | - Henning Urlaub
- Research group of Bioanalytical Mass Spectrometry, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Bioanalytics, Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany; Göttingen Center for Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Alice J A M Sijts
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands; Chair T-cell Tolerance, Leibniz Institute for Immunotherapy, Regensburg, Germany.
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom.
| | - Juliane Liepe
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Facility for Data Sciences and Biostatistics, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany.
| |
Collapse
|
16
|
Xie X, Zhang O, Yeo MJR, Lee C, Tao R, Harry SA, Payne NC, Nam E, Paul L, Li Y, Kwok HS, Jiang H, Mao H, Hadley JL, Lin H, Batts M, Gosavi PM, D'Angiolella V, Cole PA, Mazitschek R, Northcott PA, Zheng N, Liau BB. Converging mechanism of UM171 and KBTBD4 neomorphic cancer mutations. Nature 2025; 639:241-249. [PMID: 39939763 PMCID: PMC11882451 DOI: 10.1038/s41586-024-08533-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 12/17/2024] [Indexed: 02/14/2025]
Abstract
Cancer mutations can create neomorphic protein-protein interactions to drive aberrant function1,2. As a substrate receptor of the CULLIN3-RING E3 ubiquitin ligase complex, KBTBD4 is recurrently mutated in medulloblastoma3, the most common embryonal brain tumour in children4. These mutations impart gain-of-function to KBTBD4 to induce aberrant degradation of the transcriptional corepressor CoREST5. However, their mechanism remains unresolved. Here we establish that KBTBD4 mutations promote CoREST degradation through engaging HDAC1/2 as the direct target of the mutant substrate receptor. Using deep mutational scanning, we chart the mutational landscape of the KBTBD4 cancer hotspot, revealing distinct preferences by which insertions and substitutions can promote gain-of-function and the critical residues involved in the hotspot interaction. Cryo-electron microscopy analysis of two distinct KBTBD4 cancer mutants bound to LSD1-HDAC1-CoREST reveals that a KBTBD4 homodimer asymmetrically engages HDAC1 with two KELCH-repeat β-propeller domains. The interface between HDAC1 and one of the KBTBD4 β-propellers is stabilized by the medulloblastoma mutations, which insert a bulky side chain into the HDAC1 active site pocket. Our structural and mutational analyses inform how this hotspot E3-neosubstrate interface can be chemically modulated. First, we unveil a converging shape-complementarity-based mechanism between gain-of-function E3 mutations and a molecular glue degrader, UM171. Second, we demonstrate that HDAC1/2 inhibitors can block the mutant KBTBD4-HDAC1 interface and proliferation of KBTBD4-mutant medulloblastoma cells. Altogether, our work reveals the structural and mechanistic basis of cancer mutation-driven neomorphic protein-protein interactions.
Collapse
Affiliation(s)
- Xiaowen Xie
- Department of Pharmacology, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Olivia Zhang
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Megan J R Yeo
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ceejay Lee
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ran Tao
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Stefan A Harry
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - N Connor Payne
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Center for Systems Biology, Massachusetts General Hospital, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eunju Nam
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Leena Paul
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yiran Li
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hui Si Kwok
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hanjie Jiang
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Haibin Mao
- Department of Pharmacology, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Jennifer L Hadley
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hong Lin
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Melissa Batts
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Pallavi M Gosavi
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vincenzo D'Angiolella
- Edinburgh Cancer Research, Cancer Research UK Scotland Centre, The Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Philip A Cole
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Ralph Mazitschek
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Systems Biology, Massachusetts General Hospital, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Paul A Northcott
- Center of Excellence in Neuro-Oncology Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Ning Zheng
- Department of Pharmacology, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Brian B Liau
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
17
|
Allen A, Cooper BH, Singh J, Rohs R, Qin PZ. PAM-adjacent DNA flexibility tunes CRISPR-Cas12a off-target binding. Sci Rep 2025; 15:4930. [PMID: 39929897 PMCID: PMC11811290 DOI: 10.1038/s41598-025-87565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 01/20/2025] [Indexed: 02/13/2025] Open
Abstract
Cas12a is a class 2 type V CRISPR-associated nuclease that uses an effector complex comprised of a single protein activated by a CRISPR-encoded small RNA to cleave double-stranded DNA at specific sites. Cas12a processes unique features as compared to other CRISPR effector nucleases such as Cas9, and has been demonstrated as an effective tool for manipulating complex genomes. Prior studies have indicated that DNA flexibility at the region adjacent to the protospacer-adjacent-motif (PAM) contributes to Cas12a target recognition. Here, we adapted a SELEX-seq approach to further examine the connection between PAM-adjacent DNA flexibility and off-target binding by Cas12a. A DNA library containing DNA-DNA mismatches at PAM + 1 to + 6 positions was generated and subjected to binding in vitro with FnCas12a in the absence of pairing between the RNA guide and DNA target. The bound and unbound populations were sequenced to determine the propensity for off-target binding for each of the individual sequences. Analyzing the position and nucleotide dependency of the DNA-DNA mismatches showed that PAM-dependent Cas12a off-target binding requires unpairing of the protospacer at PAM + 1 and increases with unpairing at PAM + 2 and + 3. This revealed that PAM-adjacent DNA flexibility can tune Cas12a off-target binding. The work adds support to the notion that physical properties of the DNA modulate Cas12a target discrimination, and has implications on Cas12a-based applications.
Collapse
Affiliation(s)
- Aleique Allen
- Department of Chemistry, University of Southern California, 3430 S Vermont Ave., Los Angeles, CA, 90089, USA
| | - Brendon H Cooper
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
- Beckman Coulter, 1584 Enterprise Blvd, West Sacramento, CA, 95691, USA
| | - Jaideep Singh
- Department of Chemistry, University of Southern California, 3430 S Vermont Ave., Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Chemistry, University of Southern California, 3430 S Vermont Ave., Los Angeles, CA, 90089, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA, 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
| | - Peter Z Qin
- Department of Chemistry, University of Southern California, 3430 S Vermont Ave., Los Angeles, CA, 90089, USA.
| |
Collapse
|
18
|
Lopez SC, Lee Y, Zhang K, Shipman SL. SspA is a transcriptional regulator of CRISPR adaptation in E. coli. Nucleic Acids Res 2025; 53:gkae1244. [PMID: 39727179 DOI: 10.1093/nar/gkae1244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 11/23/2024] [Accepted: 12/04/2024] [Indexed: 12/28/2024] Open
Abstract
The CRISPR integrases Cas1-Cas2 create immunological memories of viral infection by storing phage-derived DNA in CRISPR arrays, a process known as CRISPR adaptation. A number of host factors have been shown to influence adaptation, but the full pathway from infection to a fully integrated, phage-derived sequences in the array remains incomplete. Here, we deploy a new CRISPRi-based screen to identify putative host factors that participate in CRISPR adaptation in the Escherichia coli Type I-E system. Our screen and subsequent mechanistic characterization reveal that SspA, through its role as a global transcriptional regulator of cellular stress, is required for functional CRISPR adaptation. One target of SspA is H-NS, a known repressor of CRISPR interference proteins, but we find that the role of SspA on adaptation is not H-NS-dependent. We propose a new model of CRISPR-Cas defense that includes independent cellular control of adaptation and interference by SspA.
Collapse
Affiliation(s)
- Santiago C Lopez
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Graduate Program in Bioengineering, University of California, San Francisco and Berkeley, 1700 Fourth St, San Francisco, CA 94158, USA
| | - Yumie Lee
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
| | - Karen Zhang
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Graduate Program in Bioengineering, University of California, San Francisco and Berkeley, 1700 Fourth St, San Francisco, CA 94158, USA
| | - Seth L Shipman
- Gladstone Institute of Data Science and Biotechnology, 1650 Owens St, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 600 16th Street, San Francisco, CA CA94158, USA
- Chan Zuckerberg Biohub San Francisco,, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|
19
|
Iliushchenko D, Efimenko B, Mikhailova AG, Shamanskiy V, Saparbaev MK, Matkarimov BT, Mazunin I, Voronka A, Knorre D, Kunz WS, Kapranov P, Denisov S, Fellay J, Khrapko K, Gunbin K, Popadin K. Deciphering the Foundations of Mitochondrial Mutational Spectra: Replication-Driven and Damage-Induced Signatures Across Chordate Classes. Mol Biol Evol 2025; 42:msae261. [PMID: 39903101 PMCID: PMC11792237 DOI: 10.1093/molbev/msae261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 11/08/2024] [Accepted: 12/04/2024] [Indexed: 02/06/2025] Open
Abstract
Mitochondrial DNA (mtDNA) mutagenesis remains poorly understood despite its crucial role in disease, aging, and evolutionary tracing. In this study, we reconstructed a comprehensive 192-component mtDNA mutational spectrum for chordates by analyzing 118,397 synonymous mutations in the CytB gene across 1,697 species and five classes. This analysis revealed three primary forces shaping mtDNA mutagenesis: (i) symmetrical, replication-driven errors by mitochondrial polymerase (POLG), resulting in C > T and A > G mutations that are highly conserved across classes; (ii) asymmetrical, damage-driven C > T mutations on the single-stranded heavy strand with clock-like dynamics; and (iii) asymmetrical A > G mutations on the heavy strand, with dynamics suggesting sensitivity to oxidative damage. The third component, sensitive to oxidative damage, positions mtDNA mutagenesis as a promising marker for metabolic and physiological processes across various classes, species, organisms, tissues, and cells. The deconvolution of the mutational spectra into mutational signatures uncovered deficiencies in both base excision repair (BER) and mismatch repair (MMR) pathways. Further analysis of mutation hotspots, abasic sites, and mutational asymmetries underscores the critical role of single-stranded DNA damage (components ii and iii), which, uncorrected due to BER and MMR deficiencies, contributes roughly as many mutations as POLG-induced errors (component i).
Collapse
Affiliation(s)
- Dmitrii Iliushchenko
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - Bogdan Efimenko
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - Alina G Mikhailova
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - Victor Shamanskiy
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - Murat K Saparbaev
- Groupe “Mechanisms of DNA Repair and Carcinogenesis”, CNRS UMR9019, Gustave Roussy Cancer Campus, Université Paris-Saclay, Villejuif, France
| | - Bakhyt T Matkarimov
- National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Astana, Kazakhstan
| | - Ilya Mazunin
- Department of Biology and Genetics, Petrovsky Medical University, Moscow, Russian Federation
- Research Centre for Medical Genetics, Moscow, Russian Federation
| | - Alexandr Voronka
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
| | - Dmitry Knorre
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Wolfram S Kunz
- Department of Epileptology and Institute of Experimental Epileptology and Cognition Research, University Bonn Medical Center, Bonn, Germany
| | | | - Stepan Denisov
- Faculty of Biology, Medicine and Health, School of Biological Sciences, The University of Manchester, Manchester, UK
| | - Jacques Fellay
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Konstantin Gunbin
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russian Federation
| | - Konstantin Popadin
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russian Federation
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
20
|
Minniti J, Checler F, Duplan E, Alves da Costa C. TFinder: A Python Web Tool for Predicting Transcription Factor Binding Sites. J Mol Biol 2025; 437:168921. [PMID: 39842990 DOI: 10.1016/j.jmb.2024.168921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 12/13/2024] [Accepted: 12/18/2024] [Indexed: 01/24/2025]
Abstract
Transcription is a key cell process that consists of synthesizing several copies of RNA from a gene DNA sequence. This process is highly regulated and closely linked to the ability of transcription factors to bind specifically to DNA. TFinder is an easy-to-use Python web portal allowing the identification of Individual Motifs (IM) such as Transcription Factor Binding Sites (TFBS). Using the NCBI API, TFinder extracts either promoter or gene terminal regulatory regions, through a simple query of NCBI gene name or ID. It enables simultaneous analysis across five different species for an unlimited number of genes. TFinder searches for Individual Motifs in different formats, including IUPAC codes and JASPAR entries. Moreover, TFinder also allows de novo generations of a Position Weight Matrix (PWM) and the use of already established PWM. Finally, the data are provided in a tabular and a graph format showing the relevance and the P-value of the Individual Motifs found as well as their location relative to the Transcription Start Site (TSS) or the terminal region of the gene. The results are then sent by email to users facilitating the subsequent data analysis and sharing. TFinder is written in Python and freely available on GitHub under the MIT license: https://github.com/Jumitti/TFinder. It can be accessed as a web application implemented in Streamlit at https://tfinder-ipmc.streamlit.app. Resources are available on Streamlit "Resources" tab. TFINDER strength is that it relies on an all-in-one intuitive tool allowing users inexperienced with bioinformatics tools to retrieve gene regulatory regions sequences in multiple species and to search for individual motifs in a huge number of genes.
Collapse
Affiliation(s)
- Julien Minniti
- University Côte d'Azur, INSERM, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire, "Laboratory of Excellence (LABEX) Distalz", Valbonne, France
| | - Frédéric Checler
- University Côte d'Azur, INSERM, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire, "Laboratory of Excellence (LABEX) Distalz", Valbonne, France
| | - Eric Duplan
- University Côte d'Azur, INSERM, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire, "Laboratory of Excellence (LABEX) Distalz", Valbonne, France.
| | - Cristine Alves da Costa
- University Côte d'Azur, INSERM, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire, "Laboratory of Excellence (LABEX) Distalz", Valbonne, France.
| |
Collapse
|
21
|
Mick S, Carroll C, Uriostegui-Arcos M, Fiszbein A. Hybrid exons evolved by coupling transcription initiation and splicing at the nucleotide level. Nucleic Acids Res 2025; 53:gkae1251. [PMID: 39739742 PMCID: PMC11797052 DOI: 10.1093/nar/gkae1251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/27/2024] [Accepted: 12/05/2024] [Indexed: 01/02/2025] Open
Abstract
Exons within transcripts are traditionally classified as first, internal or last exons, each governed by different regulatory mechanisms. We recently described the widespread usage of 'hybrid' exons that serve as terminal or internal exons in different transcripts. Here, we employ an interpretable deep learning pipeline to dissect the sequence features governing the co-regulation of transcription initiation and splicing in hybrid exons. Using ENCODE data from human tissues, we identified 80 000 hybrid first-internal exons. These exons often possess a relaxed chromatin state, allowing transcription initiation within the gene body. Interestingly, transcription start sites of hybrid exons are typically centered at the 3' splice site, suggesting tight coupling between splicing and transcription initiation. We identified two subcategories of hybrid exons: the majority resemble internal exons, maintaining strong 3' splice sites, while a minority show enrichment in promoter elements, resembling first exons. Diving into the evolution of their sequences, we found that human hybrid exons with orthologous first exons in other species usually gained 3' splice sites or whole exons upstream, while those with orthologous internal exons often gained promoter elements. Overall, our findings unveil the intricate regulatory landscape of hybrid exons and reveal stronger connections between transcription initiation and RNA splicing than previously acknowledged.
Collapse
Affiliation(s)
- Steven T Mick
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | - Christine L Carroll
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | | | - Ana Fiszbein
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
- Computing & Data Sciences, Boston University, 665 Commonwealth Ave., Boston, 02215, USA
| |
Collapse
|
22
|
Elhajjajy SI, Weng Z. A novel NLP-based method and algorithm to discover RNA-binding protein (RBP) motifs, contexts, binding preferences, and interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.20.631609. [PMID: 39896518 PMCID: PMC11785142 DOI: 10.1101/2025.01.20.631609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
RNA-binding proteins (RBPs) are essential modulators in the regulation of mRNA processing. The binding patterns, interactions, and functions of most RBPs are not well-characterized. Previous studies have shown that motif context is an important contributor to RBP binding specificity, but its precise role remains unclear. Despite recent computational advances to predict RBP binding, existing methods are challenging to interpret and largely lack a categorical focus on RBP motif contexts and RBP-RBP interactions. There remains a need for interpretable predictive models to disambiguate the contextual determinants of RBP binding specificity in vivo . Here, we present a novel and comprehensive pipeline to address these knowledge gaps. We devise a Natural Language Processing-based decomposition method to deconstruct sequences into entities consisting of a central target k -mer and its flanking regions, then use this representation to formulate the RBP binding prediction task as a weakly supervised Multiple Instance Learning problem. To interpret our predictions, we introduce a deterministic motif discovery algorithm designed to handle our data structure, recapitulating the established motifs of numerous RBPs as validation. Importantly, we characterize the binding motifs and binding contexts for 71 RBPs, with many of them being novel. Finally, through feature integration, transitive inference, and a new cross-prediction approach, we propose novel cooperative and competitive RBP-RBP interaction partners and hypothesize their potential regulatory functions. In summary, we present a complete computational strategy for investigating the contextual determinants of specific RBP binding, and we demonstrate the significance of our findings in delineating RBP binding patterns, interactions, and functions.
Collapse
|
23
|
Friedman RZ, Ramu A, Lichtarge S, Wu Y, Tripp L, Lyon D, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancers and silencers in the developing neural retina. Cell Syst 2025; 16:101163. [PMID: 39778579 PMCID: PMC11827711 DOI: 10.1016/j.cels.2024.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 10/17/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025]
Abstract
Deep learning is a promising strategy for modeling cis-regulatory elements. However, models trained on genomic sequences often fail to explain why the same transcription factor can activate or repress transcription in different contexts. To address this limitation, we developed an active learning approach to train models that distinguish between enhancers and silencers composed of binding sites for the photoreceptor transcription factor cone-rod homeobox (CRX). After training the model on nearly all bound CRX sites from the genome, we coupled synthetic biology with uncertainty sampling to generate additional rounds of informative training data. This allowed us to iteratively train models on data from multiple rounds of massively parallel reporter assays. The ability of the resulting models to discriminate between CRX sites with identical sequence but opposite functions establishes active learning as an effective strategy to train models of regulatory DNA. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Ryan Z Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Yawei Wu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Lloyd Tripp
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Daniel Lyon
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - David M Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Barak A Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Michael A White
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA.
| |
Collapse
|
24
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. Protein Sci 2025; 34:e70004. [PMID: 39720898 DOI: 10.1002/pro.70004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 11/19/2024] [Accepted: 12/05/2024] [Indexed: 12/26/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.
Collapse
Affiliation(s)
| | - Amy E Keating
- Department of Biology, MIT, Cambridge, Massachusetts, USA
- Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA
- Koch Institute for Integrative Cancer Research, Cambridge, Massachusetts, USA
| |
Collapse
|
25
|
Kohl F, Laufkötter O, Firth M, Krimpenfort L, Mangla P, Ansarizadeh M, Geylan G, Eklund L, De Maria L, Jakobsson L, Wiseman J. Identification of cell type-specific cell-penetrating peptides through in vivo phage display leveraged by next generation sequencing. Biomed Pharmacother 2025; 182:117740. [PMID: 39671725 DOI: 10.1016/j.biopha.2024.117740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 11/18/2024] [Accepted: 12/03/2024] [Indexed: 12/15/2024] Open
Abstract
Vascular anomalies (VA) refer to abnormal blood or lymphatic vessel architecture, most often as a result of dysregulated growth. Venous malformations (VM), a subgroup of VAs, are triggered by activating mutations in the Angiopoietin/TIE2-PI3K/AKT/mTOR signaling pathway with TIE2 L914F (gene name TEK) being one of the most frequent mutations in patients with VMs. Although systemic targeting of the overactivated pathway is possible, it would be a therapeutic advantage to restrict treatment to only the affected lesions. To identify peptides with potential selective binding to TIE2 L914F lesions we applied in vivo phage display to TIE2 L914F-overexpressing endothelial cells (ECs) in a subcutaneous matrigel xenograft mouse model of VMs. By panning for lesion-targeting phages in combination with subcellular fractionation, a screen for cell-penetrating candidate phages was established. Employing Next Generation Sequencing (NGS) and a refined bioinformatic analysis we were able to identify many novel cell-penetrating peptides (CPPs). To pinpoint the most selective and viable CCP candidates a hierarchical clustering algorithm was utilized. This method aggregated CPPs with highly similar sequences into a small number of clusters from which consensus sequences could be derived. Selected candidate CPPs exhibited uptake in TIE2 L914F-expressing human umbilical vein endothelial cells (HUVEC) in culture and were able to deliver siRNA into these cells. In conclusion, our NGS bioinformatic-supported approach led to the identification of novel and selective CPPs capable of transporting a siRNA cargo into targeted cells.
Collapse
Affiliation(s)
- Franziska Kohl
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden; Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Oliver Laufkötter
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| | - Mike Firth
- Data Sciences and Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Luc Krimpenfort
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Priyanka Mangla
- Oligonucleotides and Targeted Delivery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Mohammadhassan Ansarizadeh
- Oulu Center for Cell-Matrix Research, University of Oulu, Oulu, Finland; Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland; Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Gökçe Geylan
- Molecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden; Division of Systems and Synthetic Biology, Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Lauri Eklund
- Oulu Center for Cell-Matrix Research, University of Oulu, Oulu, Finland; Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland; Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Leonardo De Maria
- Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Lars Jakobsson
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - John Wiseman
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
26
|
Thrift WJ, Lounsbury NW, Broadwell Q, Heidersbach A, Freund E, Abdolazimi Y, Phung QT, Chen J, Capietto AH, Tong AJ, Rose CM, Blanchette C, Lill JR, Haley B, Delamarre L, Bourgon R, Liu K, Jhunjhunwala S. Towards designing improved cancer immunotherapy targets with a peptide-MHC-I presentation model, HLApollo. Nat Commun 2024; 15:10752. [PMID: 39737928 DOI: 10.1038/s41467-024-54887-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/25/2024] [Indexed: 01/01/2025] Open
Abstract
Based on the success of cancer immunotherapy, personalized cancer vaccines have emerged as a leading oncology treatment. Antigen presentation on MHC class I (MHC-I) is crucial for the adaptive immune response to cancer cells, necessitating highly predictive computational methods to model this phenomenon. Here, we introduce HLApollo, a transformer-based model for peptide-MHC-I (pMHC-I) presentation prediction, leveraging the language of peptides, MHC, and source proteins. HLApollo provides end-to-end treatment of MHC-I sequences and deconvolution of multi-allelic data, using a negative-set switching strategy to mitigate misassigned negatives in unlabelled ligandome data. HLApollo shows a 12.65% increase in average precision (AP) on ligandome data and a 4.1% AP increase on immunogenicity test data compared to next-best models. Incorporating protein features from protein language models yields further gains and reduces the need for gene expression measurements. Guided by clinical use, we demonstrate pan-allelic generalization which effectively captures rare alleles in underrepresented ancestries.
Collapse
Affiliation(s)
- William John Thrift
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA
| | | | - Quade Broadwell
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA
| | - Amy Heidersbach
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Emily Freund
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Yassan Abdolazimi
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | - Qui T Phung
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | - Jieming Chen
- Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
| | | | - Ann-Jay Tong
- Cancer Immunology, Genentech, South San Francisco, CA, USA
| | - Christopher M Rose
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | | | - Jennie R Lill
- Microchemistry, Proteomics and Lipidomics, Genentech, South San Francisco, CA, USA
| | - Benjamin Haley
- Molecular Biology Department, Genentech, South San Francisco, CA, USA
| | | | - Richard Bourgon
- Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
- Computational Science, Freenome, South San Francisco, CA, USA
| | - Kai Liu
- Early Clinical Development Artificial Intelligence, Genentech, South San Francisco, CA, USA.
- Artificial Intelligence, SES AI, Woburn, MA, USA.
| | | |
Collapse
|
27
|
Strayer EC, Krishna S, Lee H, Vejnar C, Neuenkirchen N, Gupta A, Beaudoin JD, Giraldez AJ. NaP-TRAP reveals the regulatory grammar in 5'UTR-mediated translation regulation during zebrafish development. Nat Commun 2024; 15:10898. [PMID: 39738051 PMCID: PMC11685710 DOI: 10.1038/s41467-024-55274-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 12/06/2024] [Indexed: 01/01/2025] Open
Abstract
The cis-regulatory elements encoded in an mRNA determine its stability and translational output. While there has been a considerable effort to understand the factors driving mRNA stability, the regulatory frameworks governing translational control remain more elusive. We have developed a novel massively parallel reporter assay (MPRA) to measure mRNA translation, named Nascent Peptide Translating Ribosome Affinity Purification (NaP-TRAP). NaP-TRAP measures translation in a frame-specific manner through the immunocapture of epitope tagged nascent peptides of reporter mRNAs. We benchmark NaP-TRAP to polysome profiling and use it to quantify Kozak strength and the regulatory landscapes of 5' UTRs in the developing zebrafish embryo and in human cells. Through this approach we identified general and developmentally dynamic cis-regulatory elements, as well as potential trans-acting proteins. We find that U-rich motifs are general enhancers, and upstream ORFs and GC-rich motifs are global repressors of translation. We also observe a translational switch during the maternal-to-zygotic transition, where C-rich motifs shift from repressors to prominent activators of translation. Conversely, we show that microRNA sites in the 5' UTR repress translation following the zygotic expression of miR-430. Together these results demonstrate that NaP-TRAP is a versatile, accessible, and powerful method to decode the regulatory functions of UTRs across different systems.
Collapse
Affiliation(s)
- Ethan C Strayer
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Srikar Krishna
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Haejeong Lee
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Charles Vejnar
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Nils Neuenkirchen
- Department of Cell Biology, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA
| | - Amit Gupta
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| | - Jean-Denis Beaudoin
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA.
- Yale Center for RNA Science and Medicine, Yale University, New Haven, 06510, CT, USA.
| | - Antonio J Giraldez
- Department of Genetics, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA.
- Yale Center for RNA Science and Medicine, Yale University, New Haven, 06510, CT, USA.
- Yale Stem Cell Center, Yale University, Yale School of Medicine, New Haven, 06510, CT, USA.
| |
Collapse
|
28
|
Du H, Mallik L, Hwang D, Sun Y, Kaku C, Hoces D, Sun SM, Ghinnagow R, Carro SD, Phan HAT, Gupta S, Blackson W, Lee H, Choe CA, Dersh D, Liu J, Bell B, Yang H, Papadaki GF, Young MC, Zhou E, El Nesr G, Goli KD, Eisenlohr LC, Minn AJ, Hernandez-Lopez RA, Jardine JG, Sgourakis NG, Huang PS. Targeting peptide antigens using a multiallelic MHC I-binding system. Nat Biotechnol 2024:10.1038/s41587-024-02505-8. [PMID: 39672954 DOI: 10.1038/s41587-024-02505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 11/13/2024] [Indexed: 12/15/2024]
Abstract
Identifying highly specific T cell receptors (TCRs) or antibodies against epitopic peptides presented by class I major histocompatibility complex (MHC I) proteins remains a bottleneck in the development of targeted therapeutics. Here, we introduce targeted recognition of antigen-MHC complex reporter for MHC I (TRACeR-I), a generalizable platform for targeting peptides on polymorphic HLA-A*, HLA-B* and HLA-C* allotypes while overcoming the cross-reactivity challenges of TCRs. Our TRACeR-MHC I co-crystal structure reveals a unique antigen recognition mechanism, with TRACeR forming extensive contacts across the entire peptide length to confer single-residue specificity at the accessible positions. We demonstrate rapid screening of TRACeR-I against a panel of disease-relevant HLAs with peptides derived from human viruses (human immunodeficiency virus, Epstein-Barr virus and severe acute respiratory syndrome coronavirus 2), and oncoproteins (Kirsten rat sarcoma virus, paired-like homeobox 2b and New York esophageal squamous cell carcinoma 1). TRACeR-based bispecific T cell engagers and chimeric antigen receptor T cells exhibit on-target killing of tumor cells with high efficacy in the low nanomolar range. Our platform empowers the development of broadly applicable MHC I-targeting molecules for research, diagnostic and therapeutic applications.
Collapse
Affiliation(s)
- Haotian Du
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Leena Mallik
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel Hwang
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi Sun
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Chengzi Kaku
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Daniel Hoces
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Shirley M Sun
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Cancer Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Reem Ghinnagow
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Stephen D Carro
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hoang Anh T Phan
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sagar Gupta
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wyatt Blackson
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | - Hyejin Lee
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Christian A Choe
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Devin Dersh
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingjia Liu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Braxton Bell
- Department of Chemistry, Stanford University, Stanford, CA, USA
| | - Hongli Yang
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Georgia F Papadaki
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael C Young
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Emily Zhou
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Gina El Nesr
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Kimia Dasteh Goli
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Laurence C Eisenlohr
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Andy J Minn
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rogelio A Hernandez-Lopez
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Chan-Zuckerberg Biohub, San Francisco, CA, USA
| | - Joseph G Jardine
- Department of Immunology and Microbiology, Scripps Research Institute, La Jolla, CA, USA
| | - Nikolaos G Sgourakis
- Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Po-Ssu Huang
- Department of Chemistry, Stanford University, Stanford, CA, USA.
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Biophysics Program, Stanford University, Stanford, CA, USA.
| |
Collapse
|
29
|
Guerri F, Junet V, Farrés J, Daura X. MMPred: a tool to predict peptide mimicry events in MHC class II recognition. Front Genet 2024; 15:1500684. [PMID: 39722794 PMCID: PMC11669352 DOI: 10.3389/fgene.2024.1500684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Accepted: 11/25/2024] [Indexed: 12/28/2024] Open
Abstract
We present MMPred, a software tool that integrates epitope prediction and sequence alignment algorithms to streamline the computational analysis of molecular mimicry events in autoimmune diseases. Starting with two protein or peptide sets (e.g., from human and SARS-CoV-2), MMPred facilitates the generation, investigation, and testing of mimicry hypotheses by providing epitope predictions specifically for MHC class II alleles, which are frequently implicated in autoimmunity. However, the tool is easily extendable to MHC class I predictions by incorporating pre-trained models from CNN-PepPred and NetMHCpan. To evaluate MMPred's ability to produce biologically meaningful insights, we conducted a comprehensive assessment involving i) predicting associations between known HLA class II human autoepitopes and microbial-peptide mimicry, ii) interpreting these predictions within a systems biology framework to identify potential functional links between the predicted autoantigens and pathophysiological pathways related to autoimmune diseases, and iii) analyzing illustrative cases in the context of SARS-CoV-2 infection and autoimmunity. MMPred code and user guide are made freely available at https://github.com/ComputBiol-IBB/MMPRED.
Collapse
Affiliation(s)
- Filippo Guerri
- Anaxomics Biotech, Barcelona, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Valentin Junet
- Anaxomics Biotech, Barcelona, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | | | - Xavier Daura
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Bioingeniería, Biomateriales y Nanomedicina, Instituto de Salud Carlos III, Cerdanyola del Vallès, Spain
| |
Collapse
|
30
|
Mariani D, Setti A, Castagnetti F, Vitiello E, Stufera Mecarelli L, Di Timoteo G, Giuliani A, D’Angelo A, Santini T, Perego E, Zappone S, Liessi N, Armirotti A, Vicidomini G, Bozzoni I. ALS-associated FUS mutation reshapes the RNA and protein composition of stress granules. Nucleic Acids Res 2024; 52:13269-13289. [PMID: 39494508 PMCID: PMC11602144 DOI: 10.1093/nar/gkae942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 10/02/2024] [Accepted: 10/29/2024] [Indexed: 11/05/2024] Open
Abstract
Stress granules (SG) are part of a cellular protection mechanism where untranslated messenger RNAs and RNA-binding proteins are stored upon conditions of cellular stress. Compositional variations due to qualitative or quantitative protein changes can disrupt their functionality and alter their structure. This is the case of different forms of amyotrophic lateral sclerosis (ALS) where a causative link has been proposed between the cytoplasmic de-localization of mutant proteins, such as FUS (Fused in Sarcoma), and the formation of cytotoxic inclusions. Here, we describe the SG transcriptome in neuroblastoma cells and define several features for RNA recruitment in these condensates. We demonstrate that SG dynamics and RNA content are strongly modified by the incorporation of mutant FUS, switching to a more unstructured, AU-rich SG transcriptome. Moreover, we show that mutant FUS, together with its protein interactors and their target RNAs, are responsible for the reshaping of the mutant SG transcriptome with alterations that can be linked to neurodegeneration. Our data describe the molecular differences between physiological and pathological SG in ALS-FUS conditions, showing how FUS mutations impact the RNA and protein composition of these condensates.
Collapse
Affiliation(s)
- Davide Mariani
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Adriano Setti
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Francesco Castagnetti
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Erika Vitiello
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Lorenzo Stufera Mecarelli
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Gaia Di Timoteo
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Andrea Giuliani
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Angelo D’Angelo
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Tiziana Santini
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Eleonora Perego
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Sabrina Zappone
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Nara Liessi
- Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
| | - Andrea Armirotti
- Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
| | - Giuseppe Vicidomini
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
| | - Irene Bozzoni
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16153, Genoa, Italy
- Department of Biology and Biotechnologies “C. Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
- Center for Life Nano-& Neuro-Science, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
| |
Collapse
|
31
|
Gralak AJ, Faltejskova K, Yang AW, Steiner C, Russeil J, Grenningloh N, Inukai S, Demir M, Dainese R, Owen C, Pankevich E, Hughes TR, Kulakovskiy IV, Kribelbauer-Swietek JF, van Mierlo G, Deplancke B. Identification of methylation-sensitive human transcription factors using meSMiLE-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619598. [PMID: 39605503 PMCID: PMC11601298 DOI: 10.1101/2024.11.11.619598] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Transcription factors (TFs) are key players in eukaryotic gene regulation, but the DNA binding specificity of many TFs remains unknown. Here, we assayed 284 mostly poorly characterized, putative human TFs using selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq), revealing 72 new DNA binding motifs. To investigate whether some of the 158 TFs for which we did not find motifs preferably bind epigenetically modified DNA (i.e. methylated CG dinucleotides), we developed methylation-sensitive SMiLE-seq (meSMiLE-seq). This microfluidic assay simultaneously probes the affinity of a protein to methylated and unmethylated DNA, augmenting the capabilities of the original method to infer methylation-aware binding sites. We assayed 114 TFs with meSMiLE-seq and identified DNA-binding models for 48 proteins, including the known methylation-sensitive binding modes for POU5F1 and RFX5. For 11 TFs, binding to methylated DNA was preferred or resulted in the discovery of alternative, methylation-dependent motifs (e.g. PRDM13), while aversion towards methylated sequences was found for 13 TFs (e.g. USF3). Finally, we uncovered a potential role for ZHX2 as a putative binder of Z-DNA, a left-handed helical DNA structure which is adopted more frequently upon CpG methylation. Altogether, our study significantly expands the human TF codebook by identifying DNA binding motifs for 98 TFs, while providing a versatile platform to quantitatively assay the impact of DNA modifications on TF binding.
Collapse
Affiliation(s)
- Antoni J. Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Katerina Faltejskova
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | | | - Clemence Steiner
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Julie Russeil
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nadia Grenningloh
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sachi Inukai
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mustafa Demir
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Riccardo Dainese
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Cooper Owen
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Eugenia Pankevich
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | | | - Ivan V. Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Judith F. Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Guido van Mierlo
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Medical BioSciences, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
32
|
Mindel V, Brodsky S, Yung H, Manadre W, Barkai N. Revisiting the model for coactivator recruitment: Med15 can select its target sites independent of promoter-bound transcription factors. Nucleic Acids Res 2024; 52:12093-12111. [PMID: 39187372 PMCID: PMC11551773 DOI: 10.1093/nar/gkae718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/08/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024] Open
Abstract
Activation domains (ADs) within transcription factors (TFs) induce gene expression by recruiting coactivators such as the Mediator complex. Coactivators lack DNA binding domains (DBDs) and are assumed to passively follow their recruiting TFs. This is supported by direct AD-coactivator interactions seen in vitro but has not yet been tested in living cells. To examine that, we targeted two Med15-recruiting ADs to a range of budding yeast promoters through fusion with different DBDs. The DBD-AD fusions localized to hundreds of genomic sites but recruited Med15 and induced transcription in only a subset of bound promoters, characterized by a fuzzy-nucleosome architecture. Direct DBD-Med15 fusions shifted DBD localization towards fuzzy-nucleosome promoters, including promoters devoid of the endogenous Mediator. We propose that Med15, and perhaps other coactivators, possess inherent promoter preference and thus actively contribute to the selection of TF-induced genes.
Collapse
Affiliation(s)
- Vladimir Mindel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hadas Yung
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Wajd Manadre
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
33
|
Ndjite GM, Jiang A, Ravel C, Grant M, Jiang X, Hall B. Gut Microbial Utilization of the Alternative Sweetener, D-Allulose, via AlsE. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.07.622513. [PMID: 39574671 PMCID: PMC11580995 DOI: 10.1101/2024.11.07.622513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2024]
Abstract
D-allulose, a rare sugar with emerging potential as a low-calorie sweetener, has garnered attention as an alternative to other commercially available alternative sweeteners, such as sugar alcohols, which often cause severe gastrointestinal discomfort. D-allulose-6-phosphate 3-epimerase (AlsE) is a prokaryotic enzyme that converts D-allulose-6-phosphate into D-fructose-6-phopshate, enabling its use as a carbon source. However, the taxonomic breadth of AlsE across gut bacteria remains poorly understood, hindering insights into the utilization of D-allulose by microbial communities. In this study, we provide experimental evidence showing that Clostridium innocuum is capable of D-allulose metabolism via a homologous AlsE. A bioinformatics search of 85,202 bacterial genomes identified 116 bacterial species with AlsE homologs, suggesting a limited distribution of AlsE in bacteria. Additionally, Escherichia coli contains a copy of alsE , but it does not grow on D-allulose as a sole carbon source unless alsE is heterologously expressed. A metagenomic analysis revealed that 15.8% of 3,079 adult healthy human metagenomic samples that we analyzed contained alsE , suggesting a limited prevalence of the enzyme in the gut microbiome. These results suggest that the gut microbiome has limited capacity to metabolize D-allulose via alsE , supporting its use as an alternative sweetener with minimal impact on microbial composition and gastrointestinal symptoms. This finding also enables personalized nutrition, allowing diabetic individuals to assess their gut microbiota for alsE , and manage glycemic response while reducing gastrointestinal distress.
Collapse
Affiliation(s)
- Glory Minabou Ndjite
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Angela Jiang
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Charlotte Ravel
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Maggie Grant
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
| | - Xiaofang Jiang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Brantley Hall
- College of Computer, Mathematical and Natural Sciences, University of Maryland, College Park, Maryland, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, College Park, Maryland, USA
| |
Collapse
|
34
|
Ballmer D, Lou HJ, Ishii M, Turk BE, Akiyoshi B. Aurora B controls anaphase onset and error-free chromosome segregation in trypanosomes. J Cell Biol 2024; 223:e202401169. [PMID: 39196069 PMCID: PMC11354203 DOI: 10.1083/jcb.202401169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/12/2024] [Accepted: 07/25/2024] [Indexed: 08/29/2024] Open
Abstract
Kinetochores form the interface between chromosomes and spindle microtubules and are thus under tight control by a complex regulatory circuitry. The Aurora B kinase plays a central role within this circuitry by destabilizing improper kinetochore-microtubule attachments and relaying the attachment status to the spindle assembly checkpoint. Intriguingly, Aurora B is conserved even in kinetoplastids, a group of early-branching eukaryotes which possess a unique set of kinetochore proteins. It remains unclear how their kinetochores are regulated to ensure faithful chromosome segregation. Here, we show in Trypanosoma brucei that Aurora B activity controls the metaphase-to-anaphase transition through phosphorylation of the divergent Bub1-like protein KKT14. Depletion of KKT14 overrides the metaphase arrest resulting from Aurora B inhibition, while expression of non-phosphorylatable KKT14 delays anaphase onset. Finally, we demonstrate that re-targeting Aurora B to the outer kinetochore suffices to promote mitotic exit but causes extensive chromosome missegregation in anaphase. Our results indicate that Aurora B and KKT14 are involved in an unconventional circuitry controlling cell cycle progression in trypanosomes.
Collapse
Affiliation(s)
- Daniel Ballmer
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Hua Jane Lou
- Department of Pharmacology, Yale School of Medicine, New Haven, CT, USA
| | - Midori Ishii
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Benjamin E. Turk
- Department of Pharmacology, Yale School of Medicine, New Haven, CT, USA
| | - Bungo Akiyoshi
- Department of Biochemistry, University of Oxford, Oxford, UK
- The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
35
|
Kayrouz CM, Ireland KA, Ying VY, Davis KM, Seyedsayamdost MR. Discovery of the selenium-containing antioxidant ovoselenol derived from convergent evolution. Nat Chem 2024; 16:1868-1875. [PMID: 39143299 DOI: 10.1038/s41557-024-01600-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 07/11/2024] [Indexed: 08/16/2024]
Abstract
Selenium is an essential micronutrient, but its presence in biology has been limited to protein and nucleic acid biopolymers. The recent identification of a biosynthetic pathway for selenium-containing small molecules suggests that there is a larger family of selenometabolites that remains to be discovered. Here we identify a recently evolved branch of abundant and uncharacterized metalloenzymes that we predict are involved in selenometabolite biosynthesis using a bioinformatic search strategy that relies on the mapping of composite active site motifs. Biochemical studies confirm this prediction and show that these enzymes form an unusual C-Se bond onto histidine, thus giving rise to a distinct selenometabolite and potent antioxidant that we have termed ovoselenol. Aside from providing insights into the evolution of this enzyme class and the structural basis of C-Se bond formation, our work offers a blueprint for charting the microbial selenometabolome in the future.
Collapse
Affiliation(s)
- Chase M Kayrouz
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | | | - Vanessa Y Ying
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | | | - Mohammad R Seyedsayamdost
- Department of Chemistry, Princeton University, Princeton, NJ, USA.
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
36
|
Le TNY, Le CT, Nguyen TA. Determinants of selectivity in the dicing mechanism. Nat Commun 2024; 15:8989. [PMID: 39420173 PMCID: PMC11487123 DOI: 10.1038/s41467-024-53322-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 10/07/2024] [Indexed: 10/19/2024] Open
Abstract
Our research elucidates the cleavage processes of the RNase III enzyme, DICER, which plays a crucial role in the production of small RNAs, such as microRNAs (miRNAs) and small interfering RNAs (siRNAs). Utilizing high-throughput dicing assays, we expose the bipartite pairing rule that dictates the cleavage sites of DICER. Furthermore, we decode the intricate recognition mechanism of the primary YCR motif and identify an analogous secondary YCR motif that influences DICER's cleavage choices. Collectively, our findings clarify the bipartite pairing rule and enhance our understanding of the role of RNA motifs in modulating DICER's cleavage activity, laying the groundwork for future research on their roles in miRNA biogenesis and gene regulation.
Collapse
Affiliation(s)
- Thi Nhu-Y Le
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China
| | - Cong Truc Le
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China
| | - Tuan Anh Nguyen
- Division of Life Science, The Hong Kong University of Science & Technology, Hong Kong, China.
| |
Collapse
|
37
|
Liew D, Lim ZW, Yong EH. Machine learning-based prediction of DNA G-quadruplex folding topology with G4ShapePredictor. Sci Rep 2024; 14:24238. [PMID: 39414858 PMCID: PMC11484705 DOI: 10.1038/s41598-024-74826-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/30/2024] [Indexed: 10/18/2024] Open
Abstract
Deoxyribonucleic acid (DNA) is able to form non-canonical four-stranded helical structures with diverse folding patterns known as G-quadruplexes (G4s). G4 topologies are classified based on their relative strand orientation following the 5' to 3' phosphate backbone polarity. Broadly, G4 topologies are either parallel (4+0), antiparallel (2+2), or hybrid (3+1). G4s play crucial roles in biological processes such as DNA repair, DNA replication, transcription and have thus emerged as biological targets in drug design. While computational models have been developed to predict G4 formation, there is currently no existing model capable of predicting G4 folding topology based on its nucleic acid sequence. Therefore, we introduce G4ShapePredictor (G4SP), an application featuring a collection of multi-classification machine learning models that are trained on a custom G4 dataset combining entries from existing literature and in-house circular dichroism experiments. G4ShapePredictor is designed to accurately predict G4 folding topologies in potassium ( K + ) buffer based on its primary sequence and is able to incorporate a threshold optimization strategy allowing users to maximise precision. Furthermore, we have identified three topological sequence motifs that suggest specific G4 folding topologies of (4+0), (2+2) or (3+1) when utilising the decision-making mechanisms of G4ShapePredictor.
Collapse
Affiliation(s)
- Donn Liew
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore
| | - Zi Way Lim
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore
| | - Ee Hou Yong
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore, Singapore.
| |
Collapse
|
38
|
Yue T, Chen SY, Shen WK, Zhang ZY, Cheng L, Guo AY. TCRosetta: An Integrated Analysis and Annotation Platform for T-cell Receptor Sequences. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae013. [PMID: 39436242 PMCID: PMC11849489 DOI: 10.1093/gpbjnl/qzae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 12/23/2023] [Accepted: 01/08/2024] [Indexed: 10/23/2024]
Abstract
T cells and T-cell receptors (TCRs) are essential components of the adaptive immune system. Characterization of the TCR repertoire offers a promising and highly informative source for understanding the functions of T cells in the immune response and immunotherapy. Although TCR repertoire studies have attracted much attention, there are few online servers available for TCR repertoire analysis, especially for TCR sequence annotation or advanced analyses. Therefore, we developed TCRosetta, a comprehensive online server that integrates analytical methods for TCR repertoire analysis and visualization. TCRosetta combines general feature analysis, large-scale sequence clustering, network construction, peptide-TCR binding prediction, generation probability calculation, and k-mer motif analysis for TCR sequences, making TCR data analysis as simple as possible. The TCRosetta server accepts multiple input data formats and can analyze ∼ 20,000 TCR sequences in less than 3 min. TCRosetta is the most comprehensive web server available for TCR repertoire analysis and is freely available at https://guolab.wchscu.cn/TCRosetta/.
Collapse
Affiliation(s)
- Tao Yue
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Si-Yi Chen
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Wen-Kang Shen
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhan-Ye Zhang
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Liming Cheng
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - An-Yuan Guo
- Center for Artificial Intelligence Biology, Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
- Department of Thoracic Surgery, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
39
|
Wang Y, Lv H, Teo QW, Lei R, Gopal AB, Ouyang WO, Yeung YH, Tan TJC, Choi D, Shen IR, Chen X, Graham CS, Wu NC. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies. Immunity 2024; 57:2453-2465.e7. [PMID: 39163866 PMCID: PMC11464180 DOI: 10.1016/j.immuni.2024.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 04/24/2024] [Accepted: 07/24/2024] [Indexed: 08/22/2024]
Abstract
Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and the inaccessibility of datasets for model training. In this study, we curated >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM could identify key sequence features of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of the antibody response to the influenza virus but also provides a valuable resource for applying deep learning to antibody research.
Collapse
Affiliation(s)
- Yiquan Wang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Huibin Lv
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Qi Wen Teo
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ruipeng Lei
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Akshita B Gopal
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Wenhao O Ouyang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Yuen-Hei Yeung
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Timothy J C Tan
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Danbi Choi
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ivana R Shen
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Xin Chen
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Claire S Graham
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Nicholas C Wu
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
40
|
Wilkinson ME, Li D, Gao A, Macrae RK, Zhang F. Phage-triggered reverse transcription assembles a toxic repetitive gene from a noncoding RNA. Science 2024; 386:eadq3977. [PMID: 39208082 DOI: 10.1126/science.adq3977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Reverse transcription has frequently been co-opted for cellular functions and in prokaryotes is associated with protection against viral infection, but the underlying mechanisms of defense are generally unknown. Here, we show that in the DRT2 defense system, the reverse transcriptase binds a neighboring pseudoknotted noncoding RNA. Upon bacteriophage infection, a template region of this RNA is reverse transcribed into an array of tandem repeats that reconstitute a promoter and open reading frame, allowing expression of a toxic repetitive protein and an abortive infection response. Biochemical reconstitution of this activity and cryo-electron microscopy provide a molecular basis for repeat synthesis. Gene synthesis from a noncoding RNA is a previously unknown mode of genetic regulation in prokaryotes.
Collapse
Affiliation(s)
- Max E Wilkinson
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David Li
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Alex Gao
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Rhiannon K Macrae
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Feng Zhang
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
41
|
Chen WC, Zhou J, McCandlish DM. Density estimation for ordinal biological sequences and its applications. Phys Rev E 2024; 110:044408. [PMID: 39562961 PMCID: PMC11605730 DOI: 10.1103/physreve.110.044408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 10/03/2024] [Indexed: 11/21/2024]
Abstract
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.
Collapse
Affiliation(s)
- Wei-Chia Chen
- Department of Physics, National Chung Cheng University, Chiayi 62102, Taiwan, R.O.C
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, Florida 32611, U.S.A
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, U.S.A
| |
Collapse
|
42
|
Prince CR, Lin IN, Feaga HA. The evolution and functional significance of the programmed ribosomal frameshift in prfB. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614795. [PMID: 39386688 PMCID: PMC11463598 DOI: 10.1101/2024.09.24.614795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Release Factor 2 (RF2) is one of two peptide release factors that terminate translation in bacteria. In Escherichia coli, the gene encoding RF2, prfB, contains an in-frame premature RF2-specific stop codon. Therefore, a programmed ribosomal frameshift is required to translate full-length RF2. Here, we investigate the diversity of prfB frameshifting through bioinformatic analyses of >12,000 genomes. We present evidence that prfB frameshifting autoregulates RF2 levels throughout the bacterial domain since (i) the prfB in-frame stop codon is always TGA or TAA, both of which are recognized by RF2, and never the RF1-specific TAG stop codon, and (ii) species that lack the autoregulatory programmed frameshift likely need higher RF2 levels since, on average, they have significantly higher RF2-specific stop codon usage. Overexpression of prfB without the autoregulatory frameshift motif is toxic to Bacillus subtilis, an organism with intermediate RF2-specific stop codon usage. We did not detect the programmed frameshift in any Actinobacteriota. Consistent with this finding, we observed very low frameshift efficiency at the prfB frameshift motif in the Actinobacterium Mycobacterium smegmatis. Our work provides a more complete picture of the evolution of the RF2 programmed frameshifting motif, and its usage to prevent toxic overexpression of RF2.
Collapse
Affiliation(s)
| | - Isabella N. Lin
- Department of Microbiology, Cornell University, Ithaca, NY 14853
| | - Heather A. Feaga
- Department of Microbiology, Cornell University, Ithaca, NY 14853
| |
Collapse
|
43
|
Tang Z, Somia N, Yu Y, Koo PK. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582810. [PMID: 38464101 PMCID: PMC10925287 DOI: 10.1101/2024.02.29.582810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged to improve predictive performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that probing the representations of pre-trained gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major gap with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Collapse
Affiliation(s)
- Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Nirali Somia
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Yiyang Yu
- The Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
44
|
Ghafoor H, Asim MN, Ibrahim MA, Dengel A. ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution. Heliyon 2024; 10:e36041. [PMID: 39281576 PMCID: PMC11401092 DOI: 10.1016/j.heliyon.2024.e36041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 08/08/2024] [Indexed: 09/18/2024] Open
Abstract
Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
45
|
Eliad B, Schneider N, Ben-Naim Zgayer O, Amichan Y, Glaser F, Erdmann EA, Rajendren S, Hundley HA, Lamm AT. ADBP-1 regulates ADR-2 nuclear localization to control editing substrate selection. Nucleic Acids Res 2024; 52:9501-9518. [PMID: 39036970 PMCID: PMC11381337 DOI: 10.1093/nar/gkae641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/05/2024] [Accepted: 07/09/2024] [Indexed: 07/23/2024] Open
Abstract
Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is a prevalent and conserved RNA modification. While A-to-I RNA editing is essential in mammals, in Caenorhabditis elegans, it is not, making them invaluable for RNA editing research. In C. elegans, ADR-2 is the sole catalytic A-to-I editing enzyme, and ADR-1 is an RNA editing regulator. ADAR localization is well-studied in humans but not well-established in C. elegans. In this study, we examine the cellular and tissue-specific localization of ADR-2. We show that while ADR-2 is present in most cells in the embryo, at later developmental stages, its expression is both tissue- and cell-type-specific. Additionally, both ADARs are mainly in the nucleus. ADR-2 is adjacent to the chromosomes during the cell cycle. We show that the nuclear localization of endogenous ADR-2 depends on ADBP-1, not ADR-1. In adbp-1 mutant worms, ADR-2 is mislocalized, while ADR-1 is not, leading to decreased editing levels and de-novo editing, mostly in exons, suggesting that ADR-2 is also functional in the cytoplasm. Besides, mutated ADBP-1 affects gene expression. Furthermore, we show that ADR-2 targets adenosines with different surrounding nucleotides in exons and introns. Our findings indicate that ADR-2 cellular localization is highly regulated and affects its function.
Collapse
Affiliation(s)
- Berta Eliad
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Noa Schneider
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Orna Ben-Naim Zgayer
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Yarden Amichan
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Fabian Glaser
- Technion Center for Structural Biology, Technion Human Health Initiative, Technion, Haifa 32000, Israel
| | - Emily A Erdmann
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Suba Rajendren
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Heather A Hundley
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Ayelet T Lamm
- Faculty of Biology, Technion- Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| |
Collapse
|
46
|
Augustijn HE, Karapliafis D, Joosten KMM, Rigali S, van Wezel GP, Medema MH. LogoMotif: A Comprehensive Database of Transcription Factor Binding Site Profiles in Actinobacteria. J Mol Biol 2024; 436:168558. [PMID: 38580076 DOI: 10.1016/j.jmb.2024.168558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/28/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]
Abstract
Actinobacteria undergo a complex multicellular life cycle and produce a wide range of specialized metabolites, including the majority of the antibiotics. These biological processes are controlled by intricate regulatory pathways, and to better understand how they are controlled we need to augment our insights into the transcription factor binding sites. Here, we present LogoMotif (https://logomotif.bioinformatics.nl), an open-source database for characterized and predicted transcription factor binding sites in Actinobacteria, along with their cognate position weight matrices and hidden Markov models. Genome-wide predictions of binding site locations in Streptomyces model organisms are supplied and visualized in interactive regulatory networks. In the web interface, users can freely access, download and investigate the underlying data. With this curated collection of actinobacterial regulatory interactions, LogoMotif serves as a basis for binding site predictions, thus providing users with clues on how to elicit the expression of genes of interest and guide genome mining efforts.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Kristy M M Joosten
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Sébastien Rigali
- InBioS - Center for Protein Engineering, University of Liège, Institut de Chimie, B-4000 Liège, Belgium
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
47
|
Wang B, Mount S. Latent Dirichlet allocation mixture models for nucleotide sequence analysis. NAR Genom Bioinform 2024; 6:lqae099. [PMID: 39131816 PMCID: PMC11310860 DOI: 10.1093/nargab/lqae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 06/13/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Strings of nucleotides carrying biological information are typically described as sequence motifs represented by weight matrices or consensus sequences. However, many signals in DNA or RNA are recognized by multiple factors in temporal sequence, consist of distinct alternative motifs, or are best described by base composition. Here we apply the latent Dirichlet allocation (LDA) mixture model to nucleotide sequences. Using positions in an alignment of human or Drosophila splice sites as samples, we show that LDA readily identifies motifs, including such elusive cases as the intron branch site. Using whole sequences with positional k-mers as features, LDA can identify sequence subtypes enriched in long vs. short introns. LDA with bulk k-mers can reliably distinguish reading frame and species of origin in coding sequences from humans and Drosophila. We find that LDA is a useful model for describing heterogeneous signals, for assigning individual sequences to subtypes, and for identifying and characterizing sequences that do not fit recognized subtypes. Because LDA topic models are interpretable, they also aid the discovery of new motifs, even those present in a small fraction of samples. In summary, LDA can identify and characterize signals in nucleotide sequences, including candidate regulatory factors involved in biological processes.
Collapse
Affiliation(s)
- Bixuan Wang
- Dept. of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Stephen M Mount
- Dept. of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
48
|
Gul A, Pewe LL, Willems P, Mayer R, Thery F, Asselman C, Aernout I, Verbeke R, Eggermont D, Van Moortel L, Upton E, Zhang Y, Boucher K, Miret-Casals L, Demol H, De Smedt SC, Lentacker I, Radoshevich L, Harty JT, Impens F. Immunopeptidomics Mapping of Listeria monocytogenes T Cell Epitopes in Mice. Mol Cell Proteomics 2024; 23:100829. [PMID: 39147027 PMCID: PMC11414675 DOI: 10.1016/j.mcpro.2024.100829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 07/21/2024] [Accepted: 08/12/2024] [Indexed: 08/17/2024] Open
Abstract
Listeria monocytogenes is a foodborne intracellular bacterial model pathogen. Protective immunity against Listeria depends on an effective CD8+ T cell response, but very few T cell epitopes are known in mice as a common animal infection model for listeriosis. To identify epitopes, we screened for Listeria immunopeptides presented in the spleen of infected mice by mass spectrometry-based immunopeptidomics. We mapped more than 6000 mouse self-peptides presented on MHC class I molecules, including 12 high confident Listeria peptides from 12 different bacterial proteins. Bacterial immunopeptides with confirmed fragmentation spectra were further tested for their potential to activate CD8+ T cells, revealing VTYNYINI from the putative cell wall surface anchor family protein LMON_0576 as a novel bona fide peptide epitope. The epitope showed high biological potency in a prime boost model and can be used as a research tool to probe CD8+ T cell responses in the mouse models of Listeria infection. Together, our results demonstrate the power of immunopeptidomics for bacterial antigen identification.
Collapse
Affiliation(s)
- Adillah Gul
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lecia L Pewe
- Department of Pathology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA
| | - Patrick Willems
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB-UGent Center for Plant Systems Biology, VIB, Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Rupert Mayer
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Fabien Thery
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Caroline Asselman
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Ilke Aernout
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Rein Verbeke
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Denzel Eggermont
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Laura Van Moortel
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ellen Upton
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA
| | - Yifeng Zhang
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA
| | - Katie Boucher
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Laia Miret-Casals
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Hans Demol
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium
| | - Stefaan C De Smedt
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Ine Lentacker
- Ghent Research Group on Nanomedicines, Ghent University, Ghent, Belgium; Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Lilliana Radoshevich
- Department of Microbiology and Immunology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA; Department of Immunology and Genomic Medicine, National Jewish Health, Denver, Colorado, USA.
| | - John T Harty
- Department of Pathology, University of Iowa-Carver College of Medicine, Iowa City, Iowa, USA; Interdisciplinary Graduate Program in Immunology, University of Iowa, Iowa City, Iowa, USA.
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; VIB Proteomics Core, VIB, Ghent, Belgium.
| |
Collapse
|
49
|
Collesano L, Łuksza M, Lässig M. Energy landscapes of peptide-MHC binding. PLoS Comput Biol 2024; 20:e1012380. [PMID: 39226310 PMCID: PMC11398667 DOI: 10.1371/journal.pcbi.1012380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 09/13/2024] [Accepted: 07/31/2024] [Indexed: 09/05/2024] Open
Abstract
Molecules of the Major Histocompatibility Complex (MHC) present short protein fragments on the cell surface, an important step in T cell immune recognition. MHC-I molecules process peptides from intracellular proteins; MHC-II molecules act in antigen-presenting cells and present peptides derived from extracellular proteins. Here we show that the sequence-dependent energy landscapes of MHC-peptide binding encode class-specific nonlinearities (epistasis). MHC-I has a smooth landscape with global epistasis; the binding energy is a simple deformation of an underlying linear trait. This form of epistasis enhances the discrimination between strong-binding peptides. In contrast, MHC-II has a rugged landscape with idiosyncratic epistasis: binding depends on detailed amino acid combinations at multiple positions of the peptide sequence. The form of epistasis affects the learning of energy landscapes from training data. For MHC-I, a low-complexity problem, we derive a simple matrix model of binding energies that outperforms current models trained by machine learning. For MHC-II, higher complexity prevents learning by simple regression methods. Epistasis also affects the energy and fitness effects of mutations in antigen-derived peptides (epitopes). In MHC-I, large-effect mutations occur predominantly in anchor positions of strong-binding epitopes. In MHC-II, large effects depend on the background epitope sequence but are broadly distributed over the epitope, generating a bigger target for escape mutations due to loss of presentation. Together, our analysis shows how an energy landscape of protein-protein binding constrains the target of escape mutations from T cell immunity, linking the complexity of the molecular interactions to the dynamics of adaptive immune response.
Collapse
Affiliation(s)
- Laura Collesano
- Institute for Biological Physics, University of Cologne, Cologne, Germany
| | - Marta Łuksza
- Tisch Cancer Institute, Departments of Oncological Sciences and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Michael Lässig
- Institute for Biological Physics, University of Cologne, Cologne, Germany
| |
Collapse
|
50
|
Shrestha P, Kandel J, Tayara H, Chong KT. Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat Commun 2024; 15:6699. [PMID: 39107330 PMCID: PMC11303401 DOI: 10.1038/s41467-024-51071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/10/2024] Open
Abstract
Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.
Collapse
Affiliation(s)
- Palistha Shrestha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| |
Collapse
|