1
|
Arribas YA, Baudon B, Rotival M, Suárez G, Bonté PE, Casas V, Roubert A, Klein P, Bonnin E, Mchich B, Legoix P, Baulande S, Sadacca B, Diharce J, Waterfall JJ, Etchebest C, Carrascal M, Goudot C, Quintana-Murci L, Burbage M, Merlotti A, Amigorena S. Transposable element exonization generates a reservoir of evolving and functional protein isoforms. Cell 2024:S0092-8674(24)01328-X. [PMID: 39667937 DOI: 10.1016/j.cell.2024.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 05/26/2024] [Accepted: 11/11/2024] [Indexed: 12/14/2024]
Abstract
Alternative splicing enhances protein diversity in different ways, including through exonization of transposable elements (TEs). Recent transcriptomic analyses identified thousands of unannotated spliced transcripts with exonizing TEs, but their contribution to the proteome and biological relevance remains unclear. Here, we use transcriptome assembly, ribosome profiling, and proteomics to describe a population of 1,227 unannotated TE exonizing isoforms generated by mRNA splicing and recurrent in human populations. Despite being shorter and lowly expressed, these isoforms are shared between individuals and efficiently translated. Functional analyses show stable expression, specific cellular localization, and, in some cases, modified functions. Exonized TEs are rich in ancient genes, whereas the involved splice sites are recent and can be evolutionarily conserved. In addition, exonized TEs contribute to the secondary structure of the emerging isoforms, supporting their functional relevance. We conclude that TE-spliced isoforms represent a diversity reservoir of functional proteins on which natural selection can act.
Collapse
Affiliation(s)
- Yago A Arribas
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Blandine Baudon
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Maxime Rotival
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France
| | - Guadalupe Suárez
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Pierre-Emmanuel Bonté
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Vanessa Casas
- Biological and Environmental Proteomics, Institut d'Investigacions Biomèdiques de Barcelona-CSIC, IDIBAPS, Roselló 161, 6a planta, 08036 Barcelona, Spain
| | - Apollinaire Roubert
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Paul Klein
- INSERM U830, PSL Research University, Institute Curie Research Center, Paris, France; Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France
| | - Elisa Bonnin
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Basma Mchich
- Université Paris Cité and Université de la Réunion and Université des Antilles, INSERM, BIGR, DSIMB UMR_S1134, 74014 Paris, France
| | - Patricia Legoix
- Institut Curie, Centre de Recherche, Genomics of Excellence Platform, PSL Research University, Paris Cedex 05, France
| | - Sylvain Baulande
- Institut Curie, Centre de Recherche, Genomics of Excellence Platform, PSL Research University, Paris Cedex 05, France
| | - Benjamin Sadacca
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France; INSERM U830, PSL Research University, Institute Curie Research Center, Paris, France; Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France
| | - Julien Diharce
- Université Paris Cité and Université de la Réunion and Université des Antilles, INSERM, BIGR, DSIMB UMR_S1134, 74014 Paris, France
| | - Joshua J Waterfall
- INSERM U830, PSL Research University, Institute Curie Research Center, Paris, France; Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France
| | - Catherine Etchebest
- Université Paris Cité and Université de la Réunion and Université des Antilles, INSERM, BIGR, DSIMB UMR_S1134, 74014 Paris, France
| | - Montserrat Carrascal
- Biological and Environmental Proteomics, Institut d'Investigacions Biomèdiques de Barcelona-CSIC, IDIBAPS, Roselló 161, 6a planta, 08036 Barcelona, Spain
| | - Christel Goudot
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Lluís Quintana-Murci
- Institut Pasteur, Université Paris Cité, CNRS UMR2000, Human Evolutionary Genetics Unit, 75015 Paris, France; Chair Human Genomics and Evolution, Collège de France, 75005 Paris, France
| | - Marianne Burbage
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Antonela Merlotti
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France
| | - Sebastian Amigorena
- Institut Curie, PSL University, Inserm U932, Immunity and Cancer, 75005 Paris, France.
| |
Collapse
|
2
|
de Brevern AG. Special Issue: "Molecular Dynamics Simulations and Structural Analysis of Protein Domains". Int J Mol Sci 2024; 25:10793. [PMID: 39409122 PMCID: PMC11477144 DOI: 10.3390/ijms251910793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 09/30/2024] [Accepted: 10/07/2024] [Indexed: 10/20/2024] Open
Abstract
The 3D protein structure is the basis for all their biological functions [...].
Collapse
Affiliation(s)
- Alexandre G. de Brevern
- DSIMB Bioinformatics Team, BIGR, INSERM, Université Paris Cité, F-75015 Paris, France; ; Tel.: +33-1-4449-3000
- DSIMB Bioinformatics Team, BIGR, INSERM, Université de la Réunion, F-97715 Saint Denis, France
| |
Collapse
|
3
|
Bayarsaikhan B, Zsidó BZ, Börzsei R, Hetényi C. Efficient Refinement of Complex Structures of Flexible Histone Peptides Using Post-Docking Molecular Dynamics Protocols. Int J Mol Sci 2024; 25:5945. [PMID: 38892133 PMCID: PMC11172440 DOI: 10.3390/ijms25115945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/26/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
Histones are keys to many epigenetic events and their complexes have therapeutic and diagnostic importance. The determination of the structures of histone complexes is fundamental in the design of new drugs. Computational molecular docking is widely used for the prediction of target-ligand complexes. Large, linear peptides like the tail regions of histones are challenging ligands for docking due to their large conformational flexibility, extensive hydration, and weak interactions with the shallow binding pockets of their reader proteins. Thus, fast docking methods often fail to produce complex structures of such peptide ligands at a level appropriate for drug design. To address this challenge, and improve the structural quality of the docked complexes, post-docking refinement has been applied using various molecular dynamics (MD) approaches. However, a final consensus has not been reached on the desired MD refinement protocol. In this present study, MD refinement strategies were systematically explored on a set of problematic complexes of histone peptide ligands with relatively large errors in their docked geometries. Six protocols were compared that differ in their MD simulation parameters. In all cases, pre-MD hydration of the complex interface regions was applied to avoid the unwanted presence of empty cavities. The best-performing protocol achieved a median of 32% improvement over the docked structures in terms of the change in root mean squared deviations from the experimental references. The influence of structural factors and explicit hydration on the performance of post-docking MD refinements are also discussed to help with their implementation in future methods and applications.
Collapse
Affiliation(s)
- Bayartsetseg Bayarsaikhan
- Pharmacoinformatics Unit, Department of Pharmacology and Pharmacotherapy, Medical School, University of Pécs, Szigeti út 12, H-7624 Pécs, Hungary; (B.B.); (B.Z.Z.); (R.B.)
| | - Balázs Zoltán Zsidó
- Pharmacoinformatics Unit, Department of Pharmacology and Pharmacotherapy, Medical School, University of Pécs, Szigeti út 12, H-7624 Pécs, Hungary; (B.B.); (B.Z.Z.); (R.B.)
| | - Rita Börzsei
- Pharmacoinformatics Unit, Department of Pharmacology and Pharmacotherapy, Medical School, University of Pécs, Szigeti út 12, H-7624 Pécs, Hungary; (B.B.); (B.Z.Z.); (R.B.)
| | - Csaba Hetényi
- Pharmacoinformatics Unit, Department of Pharmacology and Pharmacotherapy, Medical School, University of Pécs, Szigeti út 12, H-7624 Pécs, Hungary; (B.B.); (B.Z.Z.); (R.B.)
- National Laboratory for Drug Research and Development, Magyar tudósok krt. 2, H-1117 Budapest, Hungary
| |
Collapse
|
4
|
Meuskens I, Kristiansen PE, Bardiaux B, Koynarev VR, Hatlem D, Prydz K, Lund R, Izadi-Pruneyre N, Linke D. A poly-proline II helix in YadA from Yersinia enterocolitica serotype O:9 facilitates heparin binding through electrostatic interactions. FEBS J 2024; 291:761-777. [PMID: 37953437 DOI: 10.1111/febs.17001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/25/2023] [Accepted: 11/09/2023] [Indexed: 11/14/2023]
Abstract
Poly-proline II helices are secondary structure motifs frequently found in ligand-binding sites. They exhibit increased flexibility and solvent exposure compared to the strongly hydrogen-bonded α-helices or β-strands and can therefore easily be misinterpreted as completely unstructured regions with an extremely high rotational freedom. Here, we show that the adhesin YadA of Yersinia enterocolitica serotype O:9 contains a poly-proline II helix interaction motif in the N-terminal region. The motif is involved in the interaction of YadAO:9 with heparin, a host glycosaminoglycan. We show that the basic residues within the N-terminal motif of YadA are required for electrostatic interactions with the sulfate groups of heparin. Biophysical methods including CD spectroscopy, solution-state NMR and SAXS all independently support the presence of a poly-proline helix allowing YadAO:9 binding to the rigid heparin. Lastly, we show that host cells deficient in sulfation of heparin and heparan sulfate are not targeted by YadAO:9 -mediated adhesion. We speculate that the YadAO:9 -heparin interaction plays an important and highly strain-specific role in the pathogenicity of Yersinia enterocolitica serotype O:9.
Collapse
Affiliation(s)
- Ina Meuskens
- Department of Biosciences, University of Oslo, Norway
| | | | - Benjamin Bardiaux
- Structural Bioinformatics Unit, CNRS UMR3528, Institut Pasteur, Université de Paris-Cité, France
| | | | - Daniel Hatlem
- Department of Biosciences, University of Oslo, Norway
| | | | - Reidar Lund
- Department of Chemistry, University of Oslo, Norway
| | - Nadia Izadi-Pruneyre
- Bacterial Transmembrane Systems Unit, CNRS UMR3528, Institut Pasteur, Université de Paris-Cité, France
| | - Dirk Linke
- Department of Biosciences, University of Oslo, Norway
| |
Collapse
|
5
|
Versini R, Sritharan S, Aykac Fas B, Tubiana T, Aimeur SZ, Henri J, Erard M, Nüsse O, Andreani J, Baaden M, Fuchs P, Galochkina T, Chatzigoulas A, Cournia Z, Santuz H, Sacquin-Mora S, Taly A. A Perspective on the Prospective Use of AI in Protein Structure Prediction. J Chem Inf Model 2024; 64:26-41. [PMID: 38124369 DOI: 10.1021/acs.jcim.3c01361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
AlphaFold2 (AF2) and RoseTTaFold (RF) have revolutionized structural biology, serving as highly reliable and effective methods for predicting protein structures. This article explores their impact and limitations, focusing on their integration into experimental pipelines and their application in diverse protein classes, including membrane proteins, intrinsically disordered proteins (IDPs), and oligomers. In experimental pipelines, AF2 models help X-ray crystallography in resolving the phase problem, while complementarity with mass spectrometry and NMR data enhances structure determination and protein flexibility prediction. Predicting the structure of membrane proteins remains challenging for both AF2 and RF due to difficulties in capturing conformational ensembles and interactions with the membrane. Improvements in incorporating membrane-specific features and predicting the structural effect of mutations are crucial. For intrinsically disordered proteins, AF2's confidence score (pLDDT) serves as a competitive disorder predictor, but integrative approaches including molecular dynamics (MD) simulations or hydrophobic cluster analyses are advocated for accurate dynamics representation. AF2 and RF show promising results for oligomeric models, outperforming traditional docking methods, with AlphaFold-Multimer showing improved performance. However, some caveats remain in particular for membrane proteins. Real-life examples demonstrate AF2's predictive capabilities in unknown protein structures, but models should be evaluated for their agreement with experimental data. Furthermore, AF2 models can be used complementarily with MD simulations. In this Perspective, we propose a "wish list" for improving deep-learning-based protein folding prediction models, including using experimental data as constraints and modifying models with binding partners or post-translational modifications. Additionally, a meta-tool for ranking and suggesting composite models is suggested, driving future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Raphaelle Versini
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sujith Sritharan
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Burcu Aykac Fas
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Thibault Tubiana
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Sana Zineb Aimeur
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Julien Henri
- Sorbonne Université, CNRS, Laboratoire de Biologie, Computationnelle et Quantitative UMR 7238, Institut de Biologie Paris-Seine, 4 Place Jussieu, F-75005 Paris, France
| | - Marie Erard
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Oliver Nüsse
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Patrick Fuchs
- Sorbonne Université, École Normale Supérieure, PSL University, CNRS, Laboratoire des Biomolécules, LBM, 75005 Paris, France
- Université de Paris, UFR Sciences du Vivant, 75013 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Alexios Chatzigoulas
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Hubert Santuz
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Antoine Taly
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| |
Collapse
|
6
|
Cadet F, Saavedra E, Syren PO, Gontero B. Editorial: Machine learning, epistasis, and protein engineering: From sequence-structure-function relationships to regulation of metabolic pathways. Front Mol Biosci 2022; 9:1098289. [PMID: 36533069 PMCID: PMC9755881 DOI: 10.3389/fmolb.2022.1098289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 11/25/2022] [Indexed: 08/29/2024] Open
Affiliation(s)
- Frederic Cadet
- Laboratory of Excellence LABEX GR, DSIMB, Inserm UMR S1134, University of Paris City and University of Reunion, Paris, France
- PEACCEL, Artificial Intelligence Department, Paris, France
| | - Emma Saavedra
- Department of Biochemistry, Instituto Nacional de Cardiología Ignacio Chávez, Mexico City, Mexico
| | - Per-Olof Syren
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology, and Health, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Fibre and Polymer Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Brigitte Gontero
- Aix Marseille University, CNRS, UMR7281 Bioénergétique et Ingénierie des Protéines, Marseille, France
| |
Collapse
|