1
|
Madigan V, Zhang Y, Raghavan R, Wilkinson ME, Faure G, Puccio E, Segel M, Lash B, Macrae RK, Zhang F. Human paraneoplastic antigen Ma2 (PNMA2) forms icosahedral capsids that can be engineered for mRNA delivery. Proc Natl Acad Sci U S A 2024; 121:e2307812120. [PMID: 38437549 PMCID: PMC10945824 DOI: 10.1073/pnas.2307812120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/20/2023] [Indexed: 03/06/2024] Open
Abstract
A number of endogenous genes in the human genome encode retroviral gag-like proteins, which were domesticated from ancient retroelements. The paraneoplastic Ma antigen (PNMA) family members encode a gag-like capsid domain, but their ability to assemble as capsids and traffic between cells remains mostly uncharacterized. Here, we systematically investigate human PNMA proteins and find that a number of PNMAs are secreted by human cells. We determine that PNMA2 forms icosahedral capsids efficiently but does not naturally encapsidate nucleic acids. We resolve the cryoelectron microscopy (cryo-EM) structure of PNMA2 and leverage the structure to design engineered PNMA2 (ePNMA2) particles with RNA packaging abilities. Recombinantly purified ePNMA2 proteins package mRNA molecules into icosahedral capsids and can function as delivery vehicles in mammalian cell lines, demonstrating the potential for engineered endogenous capsids as a nucleic acid therapy delivery modality.
Collapse
Affiliation(s)
- Victoria Madigan
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Yugang Zhang
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Rumya Raghavan
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Max E. Wilkinson
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Guilhem Faure
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Elena Puccio
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Michael Segel
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Blake Lash
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Rhiannon K. Macrae
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| | - Feng Zhang
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research at Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Cambridge, MA02139
| |
Collapse
|
2
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
3
|
Digging into the 3D Structure Predictions of AlphaFold2 with Low Confidence: Disorder and Beyond. Biomolecules 2022; 12:biom12101467. [PMID: 36291675 PMCID: PMC9599455 DOI: 10.3390/biom12101467] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 01/12/2023] Open
Abstract
AlphaFold2 (AF2) has created a breakthrough in biology by providing three-dimensional structure models for whole-proteome sequences, with unprecedented levels of accuracy. In addition, the AF2 pLDDT score, related to the model confidence, has been shown to provide a good measure of residue-wise disorder. Here, we combined AF2 predictions with pyHCA, a tool we previously developed to identify foldable segments and estimate their order/disorder ratio, from a single protein sequence. We focused our analysis on the AF2 predictions available for 21 reference proteomes (AFDB v1), in particular on their long foldable segments (>30 amino acids) that exhibit characteristics of soluble domains, as estimated by pyHCA. Among these segments, we provided a global analysis of those with very low pLDDT values along their entire length and compared their characteristics to those of segments with very high pLDDT values. We highlighted cases containing conditional order, as well as cases that could form well-folded structures but escape the AF2 prediction due to a shallow multiple sequence alignment and/or undocumented structure or fold. AF2 and pyHCA can therefore be advantageously combined to unravel cryptic structural features in whole proteomes and to refine predictions for different flavors of disorder.
Collapse
|
4
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
5
|
Papadopoulos C, Chevrollier N, Lopes A. Exploring the Peptide Potential of Genomes. Methods Mol Biol 2022; 2405:63-82. [PMID: 35298808 DOI: 10.1007/978-1-0716-1855-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent studies attribute a central role to the noncoding genome in the emergence of novel genes. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer to the organisms a vast reservoir of novel peptides. Although the majority of these peptides are anticipated as deleterious or neutral, and thereby expected to be degraded right away or short-lived in evolutionary history, some of them can confer an advantage to the organism. The latter can be further subjected to natural selection and be established as novel genes. In any case, characterizing the structural properties of these pervasively translated peptides is crucial to understand (1) their impact on the cell and (2) how some of these peptides, derived from presumed noncoding regions, can give rise to structured and functional de novo proteins. Therefore, we present a protocol that aims to explore the potential of a genome to produce novel peptides. It consists in annotating all the open reading frames (ORFs) of a genome (i.e., coding and noncoding ones) and characterizing the fold potential and other structural properties of their corresponding potential peptides. Here, we apply our protocol to a small genome and show how to apply it to very large genomes. Finally, we present a case study which aims to probe the fold potential of a set of 721 translated ORFs in mouse lncRNAs, identified with ribosome profiling experiments. Interestingly, we show that the distribution of their fold potential is different from that of the nontranslated lncRNAs and more generally from the other noncoding ORFs of the mouse.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Nicolas Chevrollier
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France.
| |
Collapse
|
6
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
7
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
8
|
Saito M, Ladha A, Strecker J, Faure G, Neumann E, Altae-Tran H, Macrae RK, Zhang F. Dual modes of CRISPR-associated transposon homing. Cell 2021; 184:2441-2453.e18. [PMID: 33770501 PMCID: PMC8276595 DOI: 10.1016/j.cell.2021.03.006] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/25/2021] [Accepted: 03/02/2021] [Indexed: 12/23/2022]
Abstract
Tn7-like transposons have co-opted CRISPR systems, including class 1 type I-F, I-B, and class 2 type V-K. Intriguingly, although these CRISPR-associated transposases (CASTs) undergo robust CRISPR RNA (crRNA)-guided transposition, they are almost never found in sites targeted by the crRNAs encoded by the cognate CRISPR array. To understand this paradox, we investigated CAST V-K and I-B systems and found two distinct modes of transposition: (1) crRNA-guided transposition and (2) CRISPR array-independent homing. We show distinct CAST systems utilize different molecular mechanisms to target their homing site. Type V-K CAST systems use a short, delocalized crRNA for RNA-guided homing, whereas type I-B CAST systems, which contain two distinct target selector proteins, use TniQ for RNA-guided DNA transposition and TnsD for homing to an attachment site. These observations illuminate a key step in the life cycle of CAST systems and highlight the diversity of molecular mechanisms mediating transposon homing.
Collapse
Affiliation(s)
- Makoto Saito
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Alim Ladha
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jonathan Strecker
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Guilhem Faure
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Edwin Neumann
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Han Altae-Tran
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rhiannon K Macrae
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Feng Zhang
- Howard Hughes Medical Institute, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
9
|
ODiNPred: comprehensive prediction of protein order and disorder. Sci Rep 2020; 10:14780. [PMID: 32901090 PMCID: PMC7479119 DOI: 10.1038/s41598-020-71716-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.
Collapse
|
10
|
Marichal L, Klein G, Armengaud J, Boulard Y, Chédin S, Labarre J, Pin S, Renault JP, Aude JC. Protein Corona Composition of Silica Nanoparticles in Complex Media: Nanoparticle Size does not Matter. NANOMATERIALS (BASEL, SWITZERLAND) 2020; 10:E240. [PMID: 32013169 PMCID: PMC7075126 DOI: 10.3390/nano10020240] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 01/17/2020] [Accepted: 01/22/2020] [Indexed: 12/30/2022]
Abstract
Biomolecules, and particularly proteins, bind on nanoparticle (NP) surfaces to form the so-called protein corona. It is accepted that the corona drives the biological distribution and toxicity of NPs. Here, the corona composition and structure were studied using silica nanoparticles (SiNPs) of different sizes interacting with soluble yeast protein extracts. Adsorption isotherms showed that the amount of adsorbed proteins varied greatly upon NP size with large NPs having more adsorbed proteins per surface unit. The protein corona composition was studied using a large-scale label-free proteomic approach, combined with statistical and regression analyses. Most of the proteins adsorbed on the NPs were the same, regardless of the size of the NPs. To go beyond, the protein physicochemical parameters relevant for the adsorption were studied: electrostatic interactions and disordered regions are the main driving forces for the adsorption on SiNPs but polypeptide sequence length seems to be an important factor as well. This article demonstrates that curvature effects exhibited using model proteins are not determining factors for the corona composition on SiNPs, when dealing with complex biological media.
Collapse
Affiliation(s)
- Laurent Marichal
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
- Université Paris-Saclay, CEA, CNRS, NIMBE, Laboratoire Interdisciplinaire sur l’Organisation Nanométrique et Supramoléculaire, 91191 Gif-sur-Yvette, France; (S.P.); (J.-P.R.)
| | - Géraldine Klein
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
- Université Paris-Saclay, CEA, CNRS, NIMBE, Laboratoire Interdisciplinaire sur l’Organisation Nanométrique et Supramoléculaire, 91191 Gif-sur-Yvette, France; (S.P.); (J.-P.R.)
- UMR Procédés Alimentaires et Microbiologiques, Equipe VAlMiS (Vin, Aliment, Microbiologie, Stress), Institut Universitaire de la Vigne et du Vin, AgroSup Dijon, Université de Bourgogne Franche-Comté, rue Claude Ladrey, BP 27877, 21000 Dijon, France
| | - Jean Armengaud
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, 30207 Bagnols-sur-Cèze, France;
| | - Yves Boulard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
| | - Stéphane Chédin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
| | - Jean Labarre
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
| | - Serge Pin
- Université Paris-Saclay, CEA, CNRS, NIMBE, Laboratoire Interdisciplinaire sur l’Organisation Nanométrique et Supramoléculaire, 91191 Gif-sur-Yvette, France; (S.P.); (J.-P.R.)
| | - Jean-Philippe Renault
- Université Paris-Saclay, CEA, CNRS, NIMBE, Laboratoire Interdisciplinaire sur l’Organisation Nanométrique et Supramoléculaire, 91191 Gif-sur-Yvette, France; (S.P.); (J.-P.R.)
| | - Jean-Christophe Aude
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France; (G.K.); (Y.B.); (S.C.); (J.L.)
| |
Collapse
|
11
|
Lamiable A, Bitard-Feildel T, Rebehmed J, Quintus F, Schoentgen F, Mornon JP, Callebaut I. A topology-based investigation of protein interaction sites using Hydrophobic Cluster Analysis. Biochimie 2019; 167:68-80. [PMID: 31525399 DOI: 10.1016/j.biochi.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/11/2019] [Indexed: 01/20/2023]
Abstract
Hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA), are conditioned binary patterns, made of hydrophobic and non-hydrophobic positions, whose limits fit well those of regular secondary structures. They were proved to be useful for predicting secondary structures in proteins from the only information of a single amino acid sequence and have permitted to assess, in a comprehensive way, the leading role of binary patterns in secondary structure preference towards a particular state. Here, we considered the available experimental 3D structures of protein globular domains to enlarge our previously reported hydrophobic cluster database (HCDB), almost doubling the number of hydrophobic cluster species (each species being defined by a unique binary pattern) that represent the most frequent structural bricks encountered within protein globular domains. We then used this updated HCDB to show that the hydrophobic amino acids of discordant clusters, i.e. those less abundant clusters for which the observed secondary structure is in disagreement with the binary pattern preference of the species to which they belong, are more exposed to solvent and are more involved in protein interfaces than the hydrophobic amino acids of concordant clusters. As amino acid composition differs between concordant/discordant clusters, considering binary patterns may be used to gain novel insights into key features of protein globular domain cores and surfaces. It can also provide useful information on possible conformational plasticity, including disorder to order transitions.
Collapse
Affiliation(s)
- Alexis Lamiable
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Joseph Rebehmed
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France; Lebanese American University, Department of Computer Science and Mathematics, Beirut, Lebanon
| | - Flavien Quintus
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Françoise Schoentgen
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Jean-Paul Mornon
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France.
| |
Collapse
|
12
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Tautz D. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife 2019; 8:44392. [PMID: 31436535 PMCID: PMC6760900 DOI: 10.7554/elife.44392] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/21/2019] [Indexed: 12/16/2022] Open
Abstract
The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation. Different species have specific genes that set them apart from other species. Yet exactly how these species-specific genes originate is not fully known. The traditional view is that existing old genes are duplicated to make a ‘spare’ copy, which can change through mutations into a new gene with a new role gradually over time. Despite there being lots of evidence supporting this theory, not all new genes found in recent years can be traced back to older genes. This led to an alternative view – that recently evolved genes can also appear ‘de novo’, and come from regions of random DNA sequences that did not previously code for a protein. So far, the possibility of genes forming de novo during evolution has largely been supported by comparing and analyzing the genomes of related species. However, very little is known about the biological role these de novo genes play. Now, Xie et al. have generated a list of recently evolved de novo mouse genes, and carried out a detailed analysis of one de novo gene expressed in females at the time when embryos implant into the uterus wall. To study the role of this gene, Xie et al. created a strain of knock-out mice that have a defunct version of the protein coded by the gene. Loss of this protein caused female mice to have their second litter after a shorter period of time and increased the likelihood that female mice would terminate their newborn pups. This suggests that this newly discovered de novo gene is involved in regulating the female reproductive cycles of mice. Further analysis showed that this de novo gene counteracts the action of an older gene that promotes the implantation of embryos. This gene has therefore likely evolved due to the benefit it offers mothers, as it protects them from experiencing the increased physiological stress caused by a premature second pregnancy. These findings support the idea that genes which have evolved de novo can have an essential biological purpose despite coming from random DNA sequences. This establishes that de novo evolution of genes is the second major mechanism of how new genes with significant biological roles can form in the genome.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian Karsten Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
13
|
Kleppe AS, Bornberg-Bauer E. Robustness by intrinsically disordered C-termini and translational readthrough. Nucleic Acids Res 2019; 46:10184-10194. [PMID: 30247639 PMCID: PMC6365619 DOI: 10.1093/nar/gky778] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 09/20/2018] [Indexed: 12/20/2022] Open
Abstract
During protein synthesis genetic instructions are passed from DNA via mRNA to the ribosome to assemble a protein chain. Occasionally, stop codons in the mRNA are bypassed and translation continues into the untranslated region (3′-UTR). This process, called translational readthrough (TR), yields a protein chain that becomes longer than would be predicted from the DNA sequence alone. Protein sequences vary in propensity for translational errors, which may yield evolutionary constraints by limiting evolutionary paths. Here we investigated TR in Saccharomyces cerevisiae by analysing ribosome profiling data. We clustered proteins as either prone or non-prone to TR, and conducted comparative analyses. We find that a relatively high frequency (5%) of genes undergo TR, including ribosomal subunit proteins. Our main finding is that proteins undergoing TR are highly expressed and have a higher proportion of intrinsically disordered C-termini. We suggest that highly expressed proteins may compensate for the deleterious effects of TR by having intrinsically disordered C-termini, which may provide conformational flexibility but without distorting native function. Moreover, we discuss whether minimizing deleterious effects of TR is also enabling exploration of the phenotypic landscape of protein isoforms.
Collapse
Affiliation(s)
- April Snofrid Kleppe
- Institute of Biodiversity and Evolution, University of Münster, Hüfferstr. 1, 48151 Münster, Germany
| | - Erich Bornberg-Bauer
- Institute of Biodiversity and Evolution, University of Münster, Hüfferstr. 1, 48151 Münster, Germany
| |
Collapse
|
14
|
Faure G, Jézéquel K, Roisné-Hamelin F, Bitard-Feildel T, Lamiable A, Marcand S, Callebaut I. Discovery and Evolution of New Domains in Yeast Heterochromatin Factor Sir4 and Its Partner Esc1. Genome Biol Evol 2019; 11:572-585. [PMID: 30668669 PMCID: PMC6394760 DOI: 10.1093/gbe/evz010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2019] [Indexed: 12/22/2022] Open
Abstract
Sir4 is a core component of heterochromatin found in yeasts of the Saccharomycetaceae family, whose general hallmark is to harbor a three-loci mating-type system with two silent loci. However, a large part of the Sir4 amino acid sequences has remained unexplored, belonging to the dark proteome. Here, we analyzed the phylogenetic profile of yet undescribed foldable regions present in Sir4 as well as in Esc1, an Sir4-interacting perinuclear anchoring protein. Within Sir4, we identified a new conserved motif (TOC) adjacent to the N-terminal KU-binding motif. We also found that the Esc1-interacting region of Sir4 is a Dbf4-related H-BRCT domain, only present in species possessing the HO endonuclease and in Kluveryomyces lactis. In addition, we found new motifs within Esc1 including a motif (Esc1-F) that is unique to species where Sir4 possesses an H-BRCT domain. Mutagenesis of conserved amino acids of the Sir4 H-BRCT domain, known to play a critical role in the Dbf4 function, shows that the function of this domain is separable from the essential role of Sir4 in transcriptional silencing and the protection from HO-induced cutting in Saccharomyces cerevisiae. In the more distant methylotrophic clade of yeasts, which often harbor a two-loci mating-type system with one silent locus, we also found a yet undescribed H-BRCT domain in a distinct protein, the ISWI2 chromatin-remodeling factor subunit Itc1. This study provides new insights on yeast heterochromatin evolution and emphasizes the interest of using sensitive methods of sequence analysis for identifying hitherto ignored functional regions within the dark proteome.
Collapse
Affiliation(s)
- Guilhem Faure
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Kévin Jézéquel
- Institut de Biologie François Jacob, IRCM/SIGRR/LTR, INSERM U1274, Université Paris-Saclay, CEA Paris-Saclay, Paris, France.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Florian Roisné-Hamelin
- Institut de Biologie François Jacob, IRCM/SIGRR/LTR, INSERM U1274, Université Paris-Saclay, CEA Paris-Saclay, Paris, France.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Alexis Lamiable
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Stéphane Marcand
- Institut de Biologie François Jacob, IRCM/SIGRR/LTR, INSERM U1274, Université Paris-Saclay, CEA Paris-Saclay, Paris, France.,Sorbonne Université, UMR CNRS 7238, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France.,Sorbonne Université, UMR CNRS 7238, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| |
Collapse
|
15
|
Bitard‐Feildel T, Lamiable A, Mornon J, Callebaut I. Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences. Proteomics 2018; 18:e1800054. [PMID: 30299594 PMCID: PMC7168002 DOI: 10.1002/pmic.201800054] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/29/2018] [Indexed: 12/17/2022]
Abstract
Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.
Collapse
Affiliation(s)
- Tristan Bitard‐Feildel
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB)Institute of Biology Paris‐Seine (IBPS)Centre national de la recherche scientifique (CNRS)Sorbonne Université75005ParisFrance
| | - Alexis Lamiable
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Jean‐Paul Mornon
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Isabelle Callebaut
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| |
Collapse
|
16
|
Domain architecture of BAF250a reveals the ARID and ARM-repeat domains with implication in function and assembly of the BAF remodeling complex. PLoS One 2018; 13:e0205267. [PMID: 30307988 PMCID: PMC6181354 DOI: 10.1371/journal.pone.0205267] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 09/02/2018] [Indexed: 12/24/2022] Open
Abstract
BAF250a and BAF250b are subunits of the SWI/SNF chromatin-remodeling complex that recruit the complex to chromatin allowing transcriptional activation of several genes. Despite being the central subunits of the SWI/SNF complex, the structural and functional annotation of BAF250a/b remains poorly understood. BAF250a (nearly 2200 residues protein) harbors an N-terminal DNA binding ARID (~110 residues) and a C-terminal folded region (~250 residues) of unknown structure and function, recently annotated as BAF250_C. Using hydrophobic core analysis, fold prediction and comparative modeling, here we have defined a domain boundary and associate a β-catenin like ARM-repeat fold to the C-terminus of BAF250a that encompass BAF250_C. The N-terminal DNA-binding ARID is found in diverse domain combinations in proteins imparting unique functions. We used a comparative sequence analysis based approach to study the ARIDs from diverse domain contexts and identified conserved residue positions that are important to preserve its core structure. Supporting this, mutation of one such conserved residue valine, at position 1067, to glycine, resulted in destabilization, loss of structural integrity and DNA binding affinity of ARID. Additionally, we identified a set of conserved and surface-exposed residues unique to the ARID when it co-occurs with the ARM repeat containing BAF250_C in BAF250a. Several of these residues are found mutated in somatic cancers. We predict that these residues in BAF250a may play important roles in mediating protein-DNA and protein-protein interactions in the BAF complex.
Collapse
|
17
|
Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2018; 2:1626-1632. [DOI: 10.1038/s41559-018-0639-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 07/09/2018] [Indexed: 11/08/2022]
|
18
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|
19
|
Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD. The Goddard and Saturn Genes Are Essential for Drosophila Male Fertility and May Have Arisen De Novo. Mol Biol Evol 2017; 34:1066-1082. [PMID: 28104747 DOI: 10.1093/molbev/msx057] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
New genes arise through a variety of mechanisms, including the duplication of existing genes and the de novo birth of genes from noncoding DNA sequences. While there are numerous examples of duplicated genes with important functional roles, the functions of de novo genes remain largely unexplored. Many newly evolved genes are expressed in the male reproductive tract, suggesting that these evolutionary innovations may provide advantages to males experiencing sexual selection. Using testis-specific RNA interference, we screened 11 putative de novo genes in Drosophila melanogaster for effects on male fertility and identified two, goddard and saturn, that are essential for spermatogenesis and sperm function. Goddard knockdown (KD) males fail to produce mature sperm, while saturn KD males produce few sperm, and these function inefficiently once transferred to females. Consistent with a de novo origin, both genes are identifiable only in Drosophila and are predicted to encode proteins with no sequence similarity to any annotated protein. However, since high levels of divergence prevented the unambiguous identification of the noncoding sequences from which each gene arose, we consider goddard and saturn to be putative de novo genes. Within Drosophila, both genes have been lost in certain lineages, but show conserved, male-specific patterns of expression in the species in which they are found. Goddard is consistently found in single-copy and evolves under purifying selection. In contrast, saturn has diversified through gene duplication and positive selection. These data suggest that de novo genes can acquire essential roles in male reproduction.
Collapse
Affiliation(s)
- Anna M Gubala
- Department of Biology, College of the Holy Cross, Worcester, MA
| | - Jonathan F Schmitz
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Tery T Vinh
- Department of Biology, College of the Holy Cross, Worcester, MA
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| | - Geoffrey D Findlay
- Department of Biology, College of the Holy Cross, Worcester, MA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| |
Collapse
|
20
|
Faure G, Ogurtsov AY, Shabalina SA, Koonin EV. Adaptation of mRNA structure to control protein folding. RNA Biol 2017; 14:1649-1654. [PMID: 28722509 DOI: 10.1080/15476286.2017.1349047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.
Collapse
Affiliation(s)
- Guilhem Faure
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Aleksey Y Ogurtsov
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Svetlana A Shabalina
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Eugene V Koonin
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| |
Collapse
|
21
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
22
|
Abstract
Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%-30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding 'dark genomic matter'. These 'de novo' proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which 'de novo' proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a 'grow slow and moult' model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.
Collapse
|
23
|
Klasberg S, Bitard-Feildel T, Mallet L. Computational Identification of Novel Genes: Current and Future Perspectives. Bioinform Biol Insights 2016; 10:121-31. [PMID: 27493475 PMCID: PMC4970615 DOI: 10.4137/bbi.s39950] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 05/31/2016] [Accepted: 06/05/2016] [Indexed: 12/31/2022] Open
Abstract
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Ludovic Mallet
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
24
|
Faure G, Ogurtsov AY, Shabalina SA, Koonin EV. Role of mRNA structure in the control of protein folding. Nucleic Acids Res 2016; 44:10898-10911. [PMID: 27466388 PMCID: PMC5159526 DOI: 10.1093/nar/gkw671] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Revised: 07/12/2016] [Accepted: 07/14/2016] [Indexed: 11/13/2022] Open
Abstract
Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
25
|
Ghouzam Y, Postic G, Guerin PE, de Brevern AG, Gelly JC. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep 2016; 6:28268. [PMID: 27319297 PMCID: PMC4913311 DOI: 10.1038/srep28268] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 06/01/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Collapse
Affiliation(s)
- Yassine Ghouzam
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Guillaume Postic
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Pierre-Edouard Guerin
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Alexandre G. de Brevern
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Jean-Christophe Gelly
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| |
Collapse
|
26
|
Rebehmed J, Quintus F, Mornon JP, Callebaut I. The respective roles of polar/nonpolar binary patterns and amino acid composition in protein regular secondary structures explored exhaustively using hydrophobic cluster analysis. Proteins 2016; 84:624-38. [DOI: 10.1002/prot.25012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 02/01/2016] [Accepted: 02/03/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Joseph Rebehmed
- CNRS UMR7590; Sorbonne Universités, Université Pierre Et Marie Curie-Paris6 - MNHN - IRD - IUC; Paris France
| | - Flavien Quintus
- CNRS UMR7590; Sorbonne Universités, Université Pierre Et Marie Curie-Paris6 - MNHN - IRD - IUC; Paris France
| | - Jean-Paul Mornon
- CNRS UMR7590; Sorbonne Universités, Université Pierre Et Marie Curie-Paris6 - MNHN - IRD - IUC; Paris France
| | - Isabelle Callebaut
- CNRS UMR7590; Sorbonne Universités, Université Pierre Et Marie Curie-Paris6 - MNHN - IRD - IUC; Paris France
| |
Collapse
|
27
|
Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D, Lehtinen S, Orengo CA, Lees JG. Gene3D: expanding the utility of domain assignments. Nucleic Acids Res 2016; 44:D404-9. [PMID: 26578585 PMCID: PMC4702871 DOI: 10.1093/nar/gkv1231] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Revised: 10/29/2015] [Accepted: 10/30/2015] [Indexed: 12/21/2022] Open
Abstract
Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼ 20,000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Sayoni Das
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Sonja Lehtinen
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK Department of Infectious Disease Epidemiology, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
28
|
Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I. Detection of orphan domains in Drosophila using "hydrophobic cluster analysis". Biochimie 2015; 119:244-53. [PMID: 25736992 DOI: 10.1016/j.biochi.2015.02.019] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 02/20/2015] [Indexed: 11/30/2022]
Abstract
INTRODUCTION Comparative genomics has become an important strategy in life science research. While many genes, and the proteins they code for, can be well characterized by assigning orthologs, a significant amount of proteins or domains remain obscure "orphans". Some orphans are overlooked by current computational methods because they rapidly diverged, others emerged relatively recently (de novo). Recent research has demonstrated the importance of orphans, and of de novo proteins and domains for development of new phenotypic traits and adaptation. New approaches for detecting novel domains are thus of paramount importance. RESULTS The hydrophobic cluster analysis (HCA) method delineates globular-like domains from the information of a protein sequence and thereby allows bypassing some of the established methods limitations based on conserved sequence similarity. In this study, HCA is tested for orphan domain detection on 12 Drosophila genomes. After their detection, the oprhan domains are classified into two categories, depending on their presence/absence in distantly related species. The two categories show significantly different physico-chemical properties when compared to previously characterized domains from the Pfam database. The newly detected domains have a higher degree of intrinsic disorder and a particular hydrophobic cluster composition. The older the domains are, the more similar their hydrophobic cluster content is to the cluster content of Pfam domains. The results suggest that, over time, newly created domains acquire a canonical set of hydrophobic clusters but conserve some features of intrinsically disordered regions. CONCLUSION Our results agree with previous findings on orphan domains and suggest that the physico-chemical properties of domains change over evolutionary long time scale. The presented HCA-based method is able to detect domains with unusual properties without relying on prior knowledge, such as the availability of homologs. Therefore, the method has large potential for complementing existing strategies to annotate genomes, and for better understanding how molecular features emerge.
Collapse
Affiliation(s)
- Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany
| | - Magdalena Heberlein
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany.
| | - Isabelle Callebaut
- IMPMC, Sorbonne Universités - UMR CNRS 7590, UPMC Univ Paris 06, Muséum National d'Histoire Naturelle, IRD UMR 206, IUC 4 Place Jussieu, F-75005 Paris, France.
| |
Collapse
|
29
|
Expanding the SRI domain family: a common scaffold for binding the phosphorylated C-terminal domain of RNA polymerase II. FEBS Lett 2014; 588:4431-7. [PMID: 25448681 DOI: 10.1016/j.febslet.2014.10.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Revised: 10/07/2014] [Accepted: 10/12/2014] [Indexed: 11/21/2022]
Abstract
The SRI domain is a small three-helix domain originally discovered near the C-terminus of both histone methyltransferase SETD2 and helicase RECQL5. The SRI domain binds to the C-terminal repeat domain of the largest subunit of RNA polymerase II, allowing SETD2 and RECQL5 to regulate various mechanisms associated with RNA transcription. Using original tools to detect common patterns in distantly related sequences, we have identified SRI domains in several additional proteins, most of which are involved in RNA metabolism. Combining sequence analysis with structural prediction, we show that this domain family is more diverse than previously thought and we predict critical structural and functional features.
Collapse
|