1
|
Nallathambi P, Umamaheswari C, Reddy B, Aarthy B, Javed M, Ravikumar P, Watpade S, Kashyap PL, Boopalakrishnan G, Kumar S, Sharma A, Kumar A. Deciphering the Genomic Landscape and Virulence Mechanisms of the Wheat Powdery Mildew Pathogen Blumeria graminis f. sp. tritici Wtn1: Insights from Integrated Genome Assembly and Conidial Transcriptomics. J Fungi (Basel) 2024; 10:267. [PMID: 38667938 PMCID: PMC11051031 DOI: 10.3390/jof10040267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/16/2024] [Accepted: 03/19/2024] [Indexed: 04/28/2024] Open
Abstract
A high-quality genome sequence from an Indian isolate of Blumeria graminis f. sp. tritici Wtn1, a persistent threat in wheat farming, was obtained using a hybrid method. The assembly of over 9.24 million DNA-sequence reads resulted in 93 contigs, totaling a 140.61 Mb genome size, potentially encoding 8480 genes. Notably, more than 73.80% of the genome, spanning approximately 102.14 Mb, comprises retro-elements, LTR elements, and P elements, influencing evolution and adaptation significantly. The phylogenomic analysis placed B. graminis f. sp. tritici Wtn1 in a distinct monocot-infecting clade. A total of 583 tRNA anticodon sequences were identified from the whole genome of the native virulent strain B. graminis f. sp. tritici, which comprises distinct genome features with high counts of tRNA anticodons for leucine (70), cysteine (61), alanine (58), and arginine (45), with only two stop codons (Opal and Ochre) present and the absence of the Amber stop codon. Comparative InterProScan analysis unveiled "shared and unique" proteins in B. graminis f. sp. tritici Wtn1. Identified were 7707 protein-encoding genes, annotated to different categories such as 805 effectors, 156 CAZymes, 6102 orthologous proteins, and 3180 distinct protein families (PFAMs). Among the effectors, genes like Avra10, Avrk1, Bcg-7, BEC1005, CSEP0105, CSEP0162, BEC1016, BEC1040, and HopI1 closely linked to pathogenesis and virulence were recognized. Transcriptome analysis highlighted abundant proteins associated with RNA processing and modification, post-translational modification, protein turnover, chaperones, and signal transduction. Examining the Environmental Information Processing Pathways in B. graminis f. sp. tritici Wtn1 revealed 393 genes across 33 signal transduction pathways. The key pathways included yeast MAPK signaling (53 genes), mTOR signaling (38 genes), PI3K-Akt signaling (23 genes), and AMPK signaling (21 genes). Additionally, pathways like FoxO, Phosphatidylinositol, the two-component system, and Ras signaling showed significant gene representation, each with 15-16 genes, key SNPs, and Indels in specific chromosomes highlighting their relevance to environmental responses and pathotype evolution. The SNP and InDel analysis resulted in about 3.56 million variants, including 3.45 million SNPs, 5050 insertions, and 5651 deletions within the whole genome of B. graminis f. sp. tritici Wtn1. These comprehensive genome and transcriptome datasets serve as crucial resources for understanding the pathogenicity, virulence effectors, retro-elements, and evolutionary origins of B. graminis f. sp. tritici Wtn1, aiding in developing robust strategies for the effective management of wheat powdery mildew.
Collapse
Affiliation(s)
- Perumal Nallathambi
- ICAR-Indian Agricultural Research Institute, Regional Station, Wellington 643231, Tamil Nadu, India; (P.N.); (C.U.); (B.A.); (P.R.)
| | - Chandrasekaran Umamaheswari
- ICAR-Indian Agricultural Research Institute, Regional Station, Wellington 643231, Tamil Nadu, India; (P.N.); (C.U.); (B.A.); (P.R.)
| | - Bhaskar Reddy
- ICAR-Indian Agricultural Research Institute, Pusa Campus, New Delhi 110012, Delhi, India; (M.J.); (G.B.)
| | - Balakrishnan Aarthy
- ICAR-Indian Agricultural Research Institute, Regional Station, Wellington 643231, Tamil Nadu, India; (P.N.); (C.U.); (B.A.); (P.R.)
| | - Mohammed Javed
- ICAR-Indian Agricultural Research Institute, Pusa Campus, New Delhi 110012, Delhi, India; (M.J.); (G.B.)
| | - Priya Ravikumar
- ICAR-Indian Agricultural Research Institute, Regional Station, Wellington 643231, Tamil Nadu, India; (P.N.); (C.U.); (B.A.); (P.R.)
| | - Santosh Watpade
- ICAR-Indian Agricultural Research Institute, Regional Station, Shimla 171004, Himachal Pradesh, India;
| | - Prem Lal Kashyap
- ICAR-Indian Institute of Wheat and Barley Research, Karnal 132001, Haryana, India; (P.L.K.); (S.K.); (A.S.)
| | | | - Sudheer Kumar
- ICAR-Indian Institute of Wheat and Barley Research, Karnal 132001, Haryana, India; (P.L.K.); (S.K.); (A.S.)
| | - Anju Sharma
- ICAR-Indian Institute of Wheat and Barley Research, Karnal 132001, Haryana, India; (P.L.K.); (S.K.); (A.S.)
| | - Aundy Kumar
- ICAR-Indian Agricultural Research Institute, Pusa Campus, New Delhi 110012, Delhi, India; (M.J.); (G.B.)
| |
Collapse
|
2
|
Spirov AV, Myasnikova EM. Problem of Domain/Building Block Preservation in the Evolution of Biological Macromolecules and Evolutionary Computation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1345-1362. [PMID: 35594219 DOI: 10.1109/tcbb.2022.3175908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Structurally and functionally isolated domains in biological macromolecular evolution, both natural and artificial, are largely similar to "schemata", building blocks (BBs), in evolutionary computation (EC). The problem of preserving in subsequent evolutionary searches the already found domains / BBs is well known and quite relevant in biology as well as in EC. Both biology and EC are seeing parallel and independent development of several approaches to identifying and preserving previously identified domains / BBs. First, we notice the similarity of DNA shuffling methods in synthetic biology and multi-parent recombination algorithms in EC. Furthermore, approaches to computer identification of domains in proteins that are being developed in biology can be aligned with BB identification methods in EC. Finally, approaches to chimeric protein libraries optimization in biology can be compared to evolutionary search methods based on probabilistic models in EC. We propose to validate the prospects of mutual exchange of ideas and transfer of algorithms and approaches between evolutionary systems biology and EC in these three principal directions. A crucial aim of this transfer is the design of new advanced experimental techniques capable of solving more complex problems of in vitro evolution.
Collapse
|
3
|
Zhou H, Hwarari D, Ma H, Xu H, Yang L, Luo Y. Genomic survey of TCP transcription factors in plants: Phylogenomics, evolution and their biology. Front Genet 2022; 13:1060546. [PMID: 36437962 PMCID: PMC9682074 DOI: 10.3389/fgene.2022.1060546] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/27/2022] [Indexed: 09/29/2023] Open
Abstract
The TEOSINTE BRANCHED1 (TBI1), CYCLOIDEA (CYC), and PROLIFERATING CELL NUCLEAR ANTIGEN FACTORS (PCF1 and PCF2) proteins truncated as TCP transcription factors carry conserved basic-helix-loop-helix (bHLH) structure, related to DNA binding functions. Evolutionary history of the TCP genes has shown their presence in early land plants. In this paper, we performed a comparative discussion on the current knowledge of the TCP Transcription Factors in lower and higher plants: their evolutionary history based on the phylogenetics of 849 TCP proteins from 37 plant species, duplication events, and biochemical roles in some of the plants species. Phylogenetics investigations confirmed the classification of TCP TFs into Class I (the PCF1/2), and Class II (the C- clade) factors; the Class II factors were further divided into the CIN- and CYC/TB1- subclade. A trace in the evolution of the TCP Factors revealed an absence of the CYC/TB1subclade in lower plants, and an independent evolution of the CYC/TB1subclade in both eudicot and monocot species. 54% of the total duplication events analyzed were biased towards the dispersed duplication, and we concluded that dispersed duplication events contributed to the expansion of the TCP gene family. Analysis in the TCP factors functional roles confirmed their involvement in various biochemical processes which mainly included promoting cell proliferation in leaves in Class I TCPs, and cell division during plant development in Class II TCP Factors. Apart from growth and development, the TCP Factors were also shown to regulate hormonal and stress response pathways. Although this paper does not exhaust the present knowledge of the TCP Transcription Factors, it provides a base for further exploration of the gene family.
Collapse
Affiliation(s)
- Haiying Zhou
- Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, Jiangsu Collaborative In-novation Center of Regional Modern Agriculture and Environmental Protection, Huaiyin Normal University, Huai’an, China
| | - Delight Hwarari
- College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
| | - Hongyu Ma
- College of Plant Protection, Nanjing Agricultural University, Nanjing, China
| | - Haibin Xu
- College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
| | - Liming Yang
- College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
| | - Yuming Luo
- Jiangsu Key Laboratory for Eco-Agricultural Biotechnology Around Hongze Lake, Jiangsu Collaborative In-novation Center of Regional Modern Agriculture and Environmental Protection, Huaiyin Normal University, Huai’an, China
| |
Collapse
|
4
|
Eicholt LA, Aubel M, Berk K, Bornberg‐Bauer E, Lange A. Heterologous expression of naturally evolved putative
de novo
proteins with chaperones. Protein Sci 2022; 31:e4371. [PMID: 35900020 PMCID: PMC9278007 DOI: 10.1002/pro.4371] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 05/03/2022] [Accepted: 05/14/2022] [Indexed: 11/23/2022]
Abstract
Over the past decade, evidence has accumulated that new protein‐coding genes can emerge de novo from previously non‐coding DNA. Most studies have focused on large scale computational predictions of de novo protein‐coding genes across a wide range of organisms. In contrast, experimental data concerning the folding and function of de novo proteins are scarce. This might be due to difficulties in handling de novo proteins in vitro, as most are short and predicted to be disordered. Here, we propose a guideline for the effective expression of eukaryotic de novo proteins in Escherichia coli. We used 11 sequences from Drosophila melanogaster and 10 from Homo sapiens, that are predicted de novo proteins from former studies, for heterologous expression. The candidate de novo proteins have varying secondary structure and disorder content. Using multiple combinations of purification tags, E. coli expression strains, and chaperone systems, we were able to increase the number of solubly expressed putative de novo proteins from 30% to 62%. Our findings indicate that the best combination for expressing putative de novo proteins in E. coli is a GST‐tag with T7 Express cells and co‐expressed chaperones. We found that, overall, proteins with higher predicted disorder were easier to express.
Collapse
Affiliation(s)
- Lars A. Eicholt
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Margaux Aubel
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Katrin Berk
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| | - Erich Bornberg‐Bauer
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
- Max Planck‐Institute for Biology Tuebingen Tübingen Germany
| | - Andreas Lange
- Institute for Evolution and Biodiversity University of Muenster Münster Germany
| |
Collapse
|
5
|
Hatano T, Palani S, Papatziamou D, Salzer R, Souza DP, Tamarit D, Makwana M, Potter A, Haig A, Xu W, Townsend D, Rochester D, Bellini D, Hussain HMA, Ettema TJG, Löwe J, Baum B, Robinson NP, Balasubramanian M. Asgard archaea shed light on the evolutionary origins of the eukaryotic ubiquitin-ESCRT machinery. Nat Commun 2022; 13:3398. [PMID: 35697693 PMCID: PMC9192718 DOI: 10.1038/s41467-022-30656-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 05/10/2022] [Indexed: 11/23/2022] Open
Abstract
The ESCRT machinery, comprising of multiple proteins and subcomplexes, is crucial for membrane remodelling in eukaryotic cells, in processes that include ubiquitin-mediated multivesicular body formation, membrane repair, cytokinetic abscission, and virus exit from host cells. This ESCRT system appears to have simpler, ancient origins, since many archaeal species possess homologues of ESCRT-III and Vps4, the components that execute the final membrane scission reaction, where they have been shown to play roles in cytokinesis, extracellular vesicle formation and viral egress. Remarkably, metagenome assemblies of Asgard archaea, the closest known living relatives of eukaryotes, were recently shown to encode homologues of the entire cascade involved in ubiquitin-mediated membrane remodelling, including ubiquitin itself, components of the ESCRT-I and ESCRT-II subcomplexes, and ESCRT-III and Vps4. Here, we explore the phylogeny, structure, and biochemistry of Asgard homologues of the ESCRT machinery and the associated ubiquitylation system. We provide evidence for the ESCRT-I and ESCRT-II subcomplexes being involved in ubiquitin-directed recruitment of ESCRT-III, as it is in eukaryotes. Taken together, our analyses suggest a pre-eukaryotic origin for the ubiquitin-coupled ESCRT system and a likely path of ESCRT evolution via a series of gene duplication and diversification events.
Collapse
Grants
- MC_U105184326 Medical Research Council
- MC_UP_1201/27 Medical Research Council
- 203276/Z/16/Z Wellcome Trust
- Wellcome Trust
- WT101885MA Wellcome Trust
- Wellcome Trust (Wellcome)
- Leverhulme Trust
- Svenska Forskningsrådet Formas (Swedish Research Council Formas)
- Above funding attributed to the authors as follows (from paper acknowledgements): Computational analysis was facilitated by resources provided by the Swedish National Infrastructure for Computing (SNIC) at the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX), partially funded by the Swedish Research Council through grant agreement no. 2018-05973. We thank the Warwick Proteomics RTP for mass spectrometry. MKB was supported by the Wellcome Trust (WT101885MA) and the European Research Council (ERC-2014-ADG No. 671083). Work by the NR laboratory was supported by start-up funds from the Division of Biomedical and Life Sciences (BLS, Lancaster University) and a Leverhulme Research Project Grant (RPG-2019-297). NR would like to thank Johanna Syrjanen for performing trial expressions of the Odinarchaeota ESCRT proteins, and Joseph Maman for helpful discussion regarding the SEC-MALS. NR, WX and AP would like to thank Charley Lai and Siu-Kei Yau for assistance with initial Odinarchaeota ESCRT protein purifications. DPS and BB would like to thank Chris Johnson at the MRC LMB Biophysics facility for performing the SEC-MALS assay on Heimdallarchaeotal Vps22. TH, HH, MB, RS, JL, D Tamarit, TE, DPS and BB received support from a Wellcome Trust collaborative award (203276/Z/16/Z). BB and DPS were supported by the MRC. D Tamarit was supported by the Swedish Research Council (International Postdoc grant 2018-06609).
Collapse
Affiliation(s)
- Tomoyuki Hatano
- Centre for Mechanochemical Cell Biology, Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
| | - Saravanan Palani
- Centre for Mechanochemical Cell Biology, Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
- Department of Biochemistry, Indian Institute of Science, Bangalore, India
| | - Dimitra Papatziamou
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK
| | - Ralf Salzer
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK
| | - Diorge P Souza
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK
| | - Daniel Tamarit
- Laboratory of Microbiology, Wageningen University, 6708 WE, Wageningen, The Netherlands
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, SE-75007, Uppsala, Sweden
| | - Mehul Makwana
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK
| | - Antonia Potter
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK
| | - Alexandra Haig
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK
| | - Wenjue Xu
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK
| | - David Townsend
- Department of Chemistry, Lancaster University, Lancaster, LA1 4YB, UK
| | - David Rochester
- Department of Chemistry, Lancaster University, Lancaster, LA1 4YB, UK
| | - Dom Bellini
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK
| | - Hamdi M A Hussain
- Centre for Mechanochemical Cell Biology, Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
| | - Thijs J G Ettema
- Laboratory of Microbiology, Wageningen University, 6708 WE, Wageningen, The Netherlands
| | - Jan Löwe
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK
| | - Buzz Baum
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK.
| | - Nicholas P Robinson
- Division of Biomedical and Life Sciences, Faculty of Health and Medicine, Lancaster University, Lancaster, LA1 4YG, UK.
| | - Mohan Balasubramanian
- Centre for Mechanochemical Cell Biology, Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK.
| |
Collapse
|
6
|
Martyn JE, Gomez-Valero L, Buchrieser C. The evolution and role of eukaryotic-like domains in environmental intracellular bacteria: the battle with a eukaryotic cell. FEMS Microbiol Rev 2022; 46:6529235. [DOI: 10.1093/femsre/fuac012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 02/09/2022] [Accepted: 02/14/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Intracellular pathogens that are able to thrive in different environments, such as Legionella spp. which preferentially live in protozoa in aquatic environments or environmental Chlamydiae which replicate either within protozoa or a range of animals, possess a plethora of cellular biology tools to influence their eukaryotic host. The host manipulation tools that evolved in the interaction with protozoa, confer these bacteria the capacity to also infect phylogenetically distinct eukaryotic cells, such as macrophages and thus they can also be human pathogens. To manipulate the host cell, bacteria use protein secretion systems and molecular effectors. Although these molecular effectors are encoded in bacteria, they are expressed and function in a eukaryotic context often mimicking or inhibiting eukaryotic proteins. Indeed, many of these effectors have eukaryotic-like domains. In this review we propose that the main pathways environmental intracellular bacteria need to subvert in order to establish the host eukaryotic cell as a replication niche are chromatin remodelling, ubiquitination signalling, and modulation of protein-protein interactions via tandem repeat domains. We then provide mechanistic insight into how these proteins might have evolved as molecular weapons. Finally, we highlight that in environmental intracellular bacteria the number of eukaryotic-like domains and proteins is considerably higher than in intracellular bacteria specialised to an isolated niche, such as obligate intracellular human pathogens. As mimics of eukaryotic proteins are critical components of host pathogen interactions, this distribution of eukaryotic-like domains suggests that the environment has selected them.
Collapse
Affiliation(s)
- Jessica E Martyn
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| | - Laura Gomez-Valero
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| | - Carmen Buchrieser
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| |
Collapse
|
7
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
8
|
Coyote-Maestas W, Nedrud D, Suma A, He Y, Matreyek KA, Fowler DM, Carnevale V, Myers CL, Schmidt D. Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling. Nat Commun 2021; 12:7114. [PMID: 34880224 PMCID: PMC8654947 DOI: 10.1038/s41467-021-27342-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - David Nedrud
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Antonio Suma
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Yungui He
- grid.17635.360000000419368657Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455 USA
| | - Kenneth A. Matreyek
- grid.67105.350000 0001 2164 3847Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Douglas M. Fowler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98115 USA ,grid.34477.330000000122986657Department of Bioengineering, University of Washington, Seattle, WA 98115 USA
| | - Vincenzo Carnevale
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Chad L. Myers
- grid.17635.360000000419368657Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
9
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
10
|
Reddy B, Kumar A, Mehta S, Sheoran N, Chinnusamy V, Prakash G. Hybrid de novo genome-reassembly reveals new insights on pathways and pathogenicity determinants in rice blast pathogen Magnaporthe oryzae RMg_Dl. Sci Rep 2021; 11:22922. [PMID: 34824307 PMCID: PMC8616942 DOI: 10.1038/s41598-021-01980-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/01/2021] [Indexed: 01/20/2023] Open
Abstract
Blast disease incited by Magnaporthe oryzae is a major threat to sustain rice production in all rice growing nations. The pathogen is widely distributed in all rice paddies and displays rapid aerial transmissions, and seed-borne latent infection. In order to understand the genetic variability, host specificity, and molecular basis of the pathogenicity-associated traits, the whole genome of rice infecting Magnaporthe oryzae (Strain RMg_Dl) was sequenced using the Illumina and PacBio (RSII compatible) platforms. The high-throughput hybrid assembly of short and long reads resulted in a total of 375 scaffolds with a genome size of 42.43 Mb. Furthermore, comparative genome analysis revealed 99% average nucleotide identity (ANI) with other oryzae genomes and 83% against M. grisea, and 73% against M. poe genomes. The gene calling identified 10,553 genes with 10,539 protein-coding sequences. Among the detected transposable elements, the LTR/Gypsy and Type LINE showed high occurrence. The InterProScan of predicted protein sequences revealed that 97% protein family (PFAM), 98% superfamily, and 95% CDD were shared among RMg_Dl and reference 70-15 genome, respectively. Additionally, 550 CAZymes with high GH family content/distribution and cell wall degrading enzymes (CWDE) such endoglucanase, beta-glucosidase, and pectate lyase were also deciphered in RMg_Dl. The prevalence of virulence factors determination revealed that 51 different VFs were found in the genome. The biochemical pathway such as starch and sucrose metabolism, mTOR signaling, cAMP signaling, MAPK signaling pathways related genes were identified in the genome. The 49,065 SNPs, 3267 insertions and 3611 deletions were detected, and majority of these varinats were located on downstream and upstream region. Taken together, the generated information will be useful to develop a specific marker for diagnosis, pathogen surveillance and tracking, molecular taxonomy, and species delineation which ultimately leads to device improved management strategies for blast disease.
Collapse
Affiliation(s)
- Bhaskar Reddy
- Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Aundy Kumar
- Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Sahil Mehta
- Crop Improvement Group, International Centre for Genetic Engineering and Biotechnology, New Delhi, 110067, India
| | - Neelam Sheoran
- Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Viswanathan Chinnusamy
- Division of Plant Physiology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Ganesan Prakash
- Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| |
Collapse
|
11
|
Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021; 118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open
Abstract
Exchanges of protein sequence modules support leaps in function unavailable through point mutations during evolution. Here we study the role of the two RAD51-interacting modules within the eight binding BRC repeats of BRCA2. We created 64 chimeric repeats by shuffling these modules and measured their binding to RAD51. We found that certain shuffled module combinations were stronger binders than any of the module combinations in the natural repeats. Surprisingly, the contribution from the two modules was poorly correlated with affinities of natural repeats, with a weak BRC8 repeat containing the most effective N-terminal module. The binding of the strongest chimera, BRC8-2, to RAD51 was improved by -2.4 kCal/mol compared to the strongest natural repeat, BRC4. A crystal structure of RAD51:BRC8-2 complex shows an improved interface fit and an extended β-hairpin in this repeat. BRC8-2 was shown to function in human cells, preventing the formation of nuclear RAD51 foci after ionizing radiation.
Collapse
Affiliation(s)
- Laurens H Lindenburg
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Teodors Pantelejevs
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Fabrice Gielen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Living Systems Institute, University of Exeter, Exeter EX4 4QD, United Kingdom
| | - Pedro Zuazua-Villar
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maren Butz
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Eric Rees
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Clemens F Kaminski
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Jessica A Downs
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marko Hyvönen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| |
Collapse
|
12
|
Gilchrist CLM, Chooi YH. Synthaser: a CD-Search enabled Python toolkit for analysing domain architecture of fungal secondary metabolite megasynth(et)ases. Fungal Biol Biotechnol 2021; 8:13. [PMID: 34763725 PMCID: PMC8582187 DOI: 10.1186/s40694-021-00120-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fungi are prolific producers of secondary metabolites (SMs), which are bioactive small molecules with important applications in medicine, agriculture and other industries. The backbones of a large proportion of fungal SMs are generated through the action of large, multi-domain megasynth(et)ases such as polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The structure of these backbones is determined by the domain architecture of the corresponding megasynth(et)ase, and thus accurate annotation and classification of these architectures is an important step in linking SMs to their biosynthetic origins in the genome. RESULTS Here we report synthaser, a Python package leveraging the NCBI's conserved domain search tool for remote prediction and classification of fungal megasynth(et)ase domain architectures. Synthaser is capable of batch sequence analysis, and produces rich textual output and interactive visualisations which allow for quick assessment of the megasynth(et)ase diversity of a fungal genome. Synthaser uses a hierarchical rule-based classification system, which can be extensively customised by the user through a web application ( http://gamcil.github.io/synthaser ). We show that synthaser provides more accurate domain architecture predictions than comparable tools which rely on curated profile hidden Markov model (pHMM)-based approaches; the utilisation of the NCBI conserved domain database also allows for significantly greater flexibility compared to pHMM approaches. In addition, we demonstrate how synthaser can be applied to large scale genome mining pipelines through the construction of an Aspergillus PKS similarity network. CONCLUSIONS Synthaser is an easy to use tool that represents a significant upgrade to previous domain architecture analysis tools. It is freely available under a MIT license from PyPI ( https://pypi.org/project/synthaser ) and GitHub ( https://github.com/gamcil/synthaser ).
Collapse
Affiliation(s)
- Cameron L M Gilchrist
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| | - Yit-Heng Chooi
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| |
Collapse
|
13
|
Gomes T, Martin-Malpartida P, Ruiz L, Aragón E, Cordeiro TN, Macias MJ. Conformational landscape of multidomain SMAD proteins. Comput Struct Biotechnol J 2021; 19:5210-5224. [PMID: 34630939 PMCID: PMC8479633 DOI: 10.1016/j.csbj.2021.09.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 12/21/2022] Open
Abstract
SMAD transcription factors, the main effectors of the TGFβ (transforming growth factor β) network, have a mixed architecture of globular domains and flexible linkers. Such a complicated architecture precluded the description of their full-length (FL) structure for many years. In this study, we unravel the structures of SMAD4 and SMAD2 proteins through an integrative approach combining Small-angle X-ray scattering, Nuclear Magnetic Resonance spectroscopy, X-ray, and computational modeling. We show that both proteins populate ensembles of conformations, with the globular domains tethered by disordered and flexible linkers, which defines a new dimension of regulation. The flexibility of the linkers facilitates DNA and protein binding and modulates the protein structure. Yet, SMAD4FL is monomeric, whereas SMAD2FL is in different monomer-dimer-trimer states, driven by interactions of the MH2 domains. Dimers are present regardless of the SMAD2FL activation state and concentration. Finally, we propose that SMAD2FL dimers are key building blocks for the quaternary structures of SMAD complexes.
Collapse
Affiliation(s)
- Tiago Gomes
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
| | - Pau Martin-Malpartida
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
| | - Lidia Ruiz
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
| | - Eric Aragón
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
| | - Tiago N. Cordeiro
- Instituto de Tecnologia Química e Biológica António Xavier (ITQB), Universidade NOVA de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Maria J. Macias
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
- ICREA, Passeig Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
14
|
Carmi G, Gorohovski A, Frenkel-Morgenstern M. EvoProDom: Evolutionary modeling of protein families by assessing translocations of protein domains. FEBS Open Bio 2021; 11:2507-2524. [PMID: 34196123 PMCID: PMC8409312 DOI: 10.1002/2211-5463.13245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 06/22/2021] [Accepted: 06/30/2021] [Indexed: 11/29/2022] Open
Abstract
Here, we introduce a novel ‘evolution of protein domains’ (EvoProDom) model for describing the evolution of proteins based on the ‘mix and merge’ of protein domains. We assembled and integrated genomic and proteomic data comprising protein domain content and orthologous proteins from 109 organisms. In EvoProDom, we characterized evolutionary events, particularly, translocations, as reciprocal exchanges of protein domains between orthologous proteins in different organisms. We showed that protein domains that translocate with highly frequency are generated by transcripts enriched in trans‐splicing events, that is, the generation of novel transcripts from the fusion of two distinct genes. In EvoProDom, we describe a general method to collate orthologous protein annotation from KEGG, and protein domain content from protein sequences using tools such as KoFamKOAL and Pfam. To summarize, EvoProDom presents a novel model for protein evolution based on the ‘mix and merge’ of protein domains rather than DNA‐based evolution models. This confers the advantage of considering chromosomal alterations as drivers of protein evolutionary events.
Collapse
Affiliation(s)
- Gon Carmi
- Cancer Genomics and BioComputing of Complex Diseases Lab, The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, Safed, 13195, Israel
| | - Alessandro Gorohovski
- Cancer Genomics and BioComputing of Complex Diseases Lab, The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, Safed, 13195, Israel
| | - Milana Frenkel-Morgenstern
- Cancer Genomics and BioComputing of Complex Diseases Lab, The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, Safed, 13195, Israel
| |
Collapse
|
15
|
Dieci G. Removing quote marks from the RNA polymerase II CTD 'code'. Biosystems 2021; 207:104468. [PMID: 34216714 DOI: 10.1016/j.biosystems.2021.104468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/24/2021] [Accepted: 06/27/2021] [Indexed: 11/27/2022]
Abstract
In eukaryotes, RNA polymerase II (Pol II) is responsible for the synthesis of all mRNAs and myriads of short and long untranslated RNAs, whose fabrication involves close spatiotemporal coordination between transcription, RNA processing and chromatin modification. Crucial for such a coordination is an unusual C-terminal domain (CTD) of the Pol II largest subunit, made of tandem repetitions (26 in yeast, 52 in chordates) of the heptapeptide with the consensus sequence YSPTSPS. Although largely unstructured and with poor sequence content, the Pol II CTD derives its extraordinary functional versatility from the fact that each amino acid in the heptapeptide can be posttranslationally modified, and that different combinations of CTD covalent marks are specifically recognized by different protein binding partners. These features have led to propose the existence of a Pol II CTD code, but this expression is generally used by authors with some caution, revealed by the frequent use of quote marks for the word 'code'. Based on the theoretical framework of code biology, it is argued here that the Pol II CTD modification system meets the requirements of a true organic code, where different CTD modification states represent organic signs whose organic meanings are biological reactions contributing to the many facets of RNA biogenesis in coordination with RNA synthesis by Pol II. Importantly, the Pol II CTD code is instantiated by adaptor proteins possessing at least two distinct domains, one of which devoted to specific recognition of CTD modification profiles. Furthermore, code rules can be altered by experimental interchange of CTD recognition domains of different adaptor proteins, a fact arguing in favor of the arbitrariness, and thus bona fide character, of the Pol II CTD code. Since the growing family of CTD adaptors includes RNA binding proteins and histone modification complexes, the Pol II CTD code is by its nature integrated with other organic codes, in particular the splicing code and the histone code. These issues will be discussed taking into account fascinating developments in Pol II CTD research, like the discovery of novel modifications at non-consensus sites, the recently recognized CTD physicochemical properties favoring liquid-liquid phase separation, and the discovery that the Pol II CTD, originated before the divergence of most extant eukaryotic taxa, has expanded and diversified with developmental complexity in animals and plants.
Collapse
Affiliation(s)
- Giorgio Dieci
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, 43124, Parma, Italy.
| |
Collapse
|
16
|
Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, Findlay GD, Bornberg-Bauer E. Structural and functional characterization of a putative de novo gene in Drosophila. Nat Commun 2021; 12:1667. [PMID: 33712569 PMCID: PMC7954818 DOI: 10.1038/s41467-021-21667-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/03/2021] [Indexed: 11/26/2022] Open
Abstract
Comparative genomic studies have repeatedly shown that new protein-coding genes can emerge de novo from noncoding DNA. Still unknown is how and when the structures of encoded de novo proteins emerge and evolve. Combining biochemical, genetic and evolutionary analyses, we elucidate the function and structure of goddard, a gene which appears to have evolved de novo at least 50 million years ago within the Drosophila genus. Previous studies found that goddard is required for male fertility. Here, we show that Goddard protein localizes to elongating sperm axonemes and that in its absence, elongated spermatids fail to undergo individualization. Combining modelling, NMR and circular dichroism (CD) data, we show that Goddard protein contains a large central α-helix, but is otherwise partially disordered. We find similar results for Goddard's orthologs from divergent fly species and their reconstructed ancestral sequences. Accordingly, Goddard's structure appears to have been maintained with only minor changes over millions of years.
Collapse
Affiliation(s)
- Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Prajal H Patel
- Department of Biology, College of the Holy Cross, Worcester, MA, USA
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Adam M Damry
- Research School of Chemistry, ANU College of Science, Canberra, Australia
| | - Thorsten Saenger
- Department of Pediatric Kidney, Liver and Metabolic Diseases, Hannover Medical School, Hannover, Germany
| | - Colin J Jackson
- Research School of Chemistry, ANU College of Science, Canberra, Australia
| | | | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
17
|
Han X, Guo J, Pang E, Song H, Lin K. Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study. Genome Biol Evol 2021; 12:185-202. [PMID: 32108239 PMCID: PMC7144356 DOI: 10.1093/gbe/evaa041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2020] [Indexed: 01/05/2023] Open
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Collapse
Affiliation(s)
- Xia Han
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| |
Collapse
|
18
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
19
|
Gumerov VM, Zhulin IB. TREND: a platform for exploring protein function in prokaryotes based on phylogenetic, domain architecture and gene neighborhood analyses. Nucleic Acids Res 2020; 48:W72-W76. [PMID: 32282909 PMCID: PMC7319448 DOI: 10.1093/nar/gkaa243] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/16/2020] [Accepted: 04/01/2020] [Indexed: 01/16/2023] Open
Abstract
Key steps in a computational study of protein function involve analysis of (i) relationships between homologous proteins, (ii) protein domain architecture and (iii) gene neighborhoods the corresponding proteins are encoded in. Each of these steps requires a separate computational task and sets of tools. Currently in order to relate protein features and gene neighborhoods information to phylogeny, researchers need to prepare all the necessary data and combine them by hand, which is time-consuming and error-prone. Here, we present a new platform, TREND (tree-based exploration of neighborhoods and domains), which can perform all the necessary steps in automated fashion and put the derived information into phylogenomic context, thus making evolutionary based protein function analysis more efficient. A rich set of adjustable components allows a user to run the computational steps specific to his task. TREND is freely available at http://trend.zhulinlab.org.
Collapse
Affiliation(s)
- Vadim M Gumerov
- Department of Microbiology and Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
| | - Igor B Zhulin
- Department of Microbiology and Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
20
|
Liu B, Leng L, Sun X, Wang Y, Ma J, Zhu Y. ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence. PeerJ 2020; 8:e9066. [PMID: 32377454 PMCID: PMC7195829 DOI: 10.7717/peerj.9066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 04/05/2020] [Indexed: 01/28/2023] Open
Abstract
Extracellular matrix (ECM) proteins play an essential role in various biological processes in multicellular organisms, and their abnormal regulation can lead to many diseases. For large-scale ECM protein identification, especially through proteomic-based techniques, a theoretical reference database of ECM proteins is required. In this study, based on the experimentally verified ECM datasets and by the integration of protein domain features and a machine learning model, we developed ECMPride, a flexible and scalable tool for predicting ECM proteins. ECMPride achieved excellent performance in predicting ECM proteins, with appropriate balanced accuracy and sensitivity, and the performance of ECMPride was shown to be superior to the previously developed tool. A new theoretical dataset of human ECM components was also established by applying ECMPride to all human entries in the SwissProt database, containing a significant number of putative ECM proteins as well as the abundant biological annotations. This dataset might serve as a valuable reference resource for ECM protein identification.
Collapse
Affiliation(s)
- Binghui Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Ling Leng
- Department of Central Laboratory, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Xuer Sun
- Tissue Engineering Lab, Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Yunfang Wang
- Tissue Engineering Lab, Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China.,Basic Medical School, Anhui Medical University, Anhui, China
| |
Collapse
|
21
|
Bohnert S, Antelo L, Grünewald C, Yemelin A, Andresen K, Jacob S. Rapid adaptation of signaling networks in the fungal pathogen Magnaporthe oryzae. BMC Genomics 2019; 20:763. [PMID: 31640564 PMCID: PMC6805500 DOI: 10.1186/s12864-019-6113-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 09/20/2019] [Indexed: 11/10/2022] Open
Abstract
Background One fundamental question in biology is how the evolution of eukaryotic signaling networks has taken place. “Loss of function” (lof) mutants from components of the high osmolarity glycerol (HOG) signaling pathway in the filamentous fungus Magnaporthe oryzae are viable, but impaired in osmoregulation. Results After long-term cultivation upon high osmolarity, stable individuals with reestablished osmoregulation capacity arise independently from each of the mutants with inactivated HOG pathway. This phenomenon is extremely reproducible and occurs only in osmosensitive mutants related to the HOG pathway – not in other osmosensitive Magnaporthe mutants. The major compatible solute produced by these adapted strains to cope with high osmolarity is glycerol, whereas it is arabitol in the wildtype strain. Genome and transcriptome analysis resulted in candidate genes related to glycerol metabolism, perhaps responsible for an epigenetic induced reestablishment of osmoregulation, since these genes do not show structural variations within the coding or promotor sequences. Conclusion This is the first report of a stable adaptation in eukaryotes by producing different metabolites and opens a door for the scientific community since the HOG pathway is worked on intensively in many eukaryotic model organisms.
Collapse
Affiliation(s)
- Stefan Bohnert
- Institut für Biotechnologie und Wirkstoff-Forschung gGmbH (IBWF), Erwin-Schrödinger-Str. 56, D-67663, Kaiserslautern, Germany
| | - Luis Antelo
- Institut für Biotechnologie und Wirkstoff-Forschung gGmbH (IBWF), Erwin-Schrödinger-Str. 56, D-67663, Kaiserslautern, Germany
| | - Christiane Grünewald
- Johannes Gutenberg-University Mainz, Mikrobiologie und Weinforschung am Institut für Molekulare Physiologie, Johann-Joachim-Becherweg 15, D-55128, Mainz, Germany
| | - Alexander Yemelin
- Institut für Biotechnologie und Wirkstoff-Forschung gGmbH (IBWF), Erwin-Schrödinger-Str. 56, D-67663, Kaiserslautern, Germany
| | - Karsten Andresen
- Johannes Gutenberg-University Mainz, Mikrobiologie und Weinforschung am Institut für Molekulare Physiologie, Johann-Joachim-Becherweg 15, D-55128, Mainz, Germany
| | - Stefan Jacob
- Institut für Biotechnologie und Wirkstoff-Forschung gGmbH (IBWF), Erwin-Schrödinger-Str. 56, D-67663, Kaiserslautern, Germany. .,Johannes Gutenberg-University Mainz, Mikrobiologie und Weinforschung am Institut für Molekulare Physiologie, Johann-Joachim-Becherweg 15, D-55128, Mainz, Germany.
| |
Collapse
|
22
|
Subirana JA, Messeguer X. Satellites in the prokaryote world. BMC Evol Biol 2019; 19:181. [PMID: 31533616 PMCID: PMC6749651 DOI: 10.1186/s12862-019-1504-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 08/28/2019] [Indexed: 11/10/2022] Open
Abstract
Background Satellites or tandem repeats are very abundant in many eukaryotic genomes. Occasionally they have been reported to be present in some prokaryotes, but to our knowledge there is no general comparative study on their occurrence. For this reason we present here an overview of the distribution and properties of satellites in a set of representative species. Our results provide novel insights into the evolutionary relationship between eukaryotes, Archaea and Bacteria. Results We have searched all possible satellites present in the NCBI reference group of genomes in Archaea (142 species) and in Bacteria (119 species), detecting 2735 satellites in Archaea and 1067 in Bacteria. We have found that the distribution of satellites is very variable in different organisms. The archaeal Methanosarcina class stands out for the large amount of satellites in their genomes. Satellites from a few species have similar characteristics to those in eukaryotes, but most species have very few satellites: only 21 species in Archaea and 18 in Bacteria have more than 4 satellites/Mb. The distribution of satellites in these species is reminiscent of what is found in eukaryotes, but we find two significant differences: most satellites have a short length and many of them correspond to segments of genes coding for amino acid repeats. Transposition of non-coding satellites throughout the genome occurs rarely: only in the bacteria Leptospira interrogans and the archaea Methanocella conradii we have detected satellite families of transposed satellites with long repeats. Conclusions Our results demonstrate that the presence of satellites in the genome is not an exclusive feature of eukaryotes. We have described a few prokaryotes which do contain satellites. We present a discussion on their eventual evolutionary significance. Electronic supplementary material The online version of this article (10.1186/s12862-019-1504-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan A Subirana
- Department of Computer Science, Universitat Politècnica de Catalunya, Jordi Girona 1-3, 08034, Barcelona, Spain.
| | - Xavier Messeguer
- Department of Computer Science, Universitat Politècnica de Catalunya, Jordi Girona 1-3, 08034, Barcelona, Spain
| |
Collapse
|
23
|
Rodrigues JV, Ogbunugafor CB, Hartl DL, Shakhnovich EI. Chimeric dihydrofolate reductases display properties of modularity and biophysical diversity. Protein Sci 2019; 28:1359-1367. [PMID: 31095809 DOI: 10.1002/pro.3646] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 05/13/2019] [Indexed: 01/12/2023]
Abstract
While reverse genetics and functional genomics have long affirmed the role of individual mutations in determining protein function, there have been fewer studies addressing how large-scale changes in protein sequences, such as in entire modular segments, influence protein function and evolution. Given how recombination can reassort protein sequences, these types of changes may play an underappreciated role in how novel protein functions evolve in nature. Such studies could aid our understanding of whether certain organismal phenotypes related to protein function-such as growth in the presence or absence of an antibiotic-are robust with respect to the identity of certain modular segments. In this study, we combine molecular genetics with biochemical and biophysical methods to gain a better understanding of protein modularity in dihydrofolate reductase (DHFR), an enzyme target of antibiotics also widely used as a model for protein evolution. We replace an integral α-helical segment of Escherichia coli DHFR with segments from a number of different organisms (many nonmicrobial) and examine how these chimeric enzymes affect organismal phenotypes (e.g., resistance to an antibiotic) as well as biophysical properties of the enzyme (e.g., thermostability). We find that organismal phenotypes and enzyme properties are highly sensitive to the identity of DHFR modules, and that this chimeric approach can create enzymes with diverse biophysical characteristics.
Collapse
Affiliation(s)
- João V Rodrigues
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - C Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| |
Collapse
|
24
|
Ratcliffe LE, Asiedu EK, Pickett CJ, Warburton MA, Izzi SA, Meedel TH. The Ciona myogenic regulatory factor functions as a typical MRF but possesses a novel N-terminus that is essential for activity. Dev Biol 2019; 448:210-225. [PMID: 30365920 PMCID: PMC6478573 DOI: 10.1016/j.ydbio.2018.10.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 08/28/2018] [Accepted: 10/16/2018] [Indexed: 11/26/2022]
Abstract
Electroporation-based assays were used to test whether the myogenic regulatory factor (MRF) of Ciona intestinalis (CiMRF) interferes with endogenous developmental programs, and to evaluate the importance of its unusual N-terminus for muscle development. We found that CiMRF suppresses both notochord and endoderm development when it is expressed in these tissues by a mechanism that may involve activation of muscle-specific microRNAs. Because these results add to a large body of evidence demonstrating the exceptionally high degree of functional conservation among MRFs, we were surprised to discover that non-ascidian MRFs were not myogenic in Ciona unless they formed part of a chimeric protein containing the CiMRF N-terminus. Equally surprising, we found that despite their widely differing primary sequences, the N-termini of MRFs of other ascidian species could form chimeric MRFs that were also myogenic in Ciona. This domain did not rescue the activity of a Brachyury protein whose transcriptional activation domain had been deleted, and so does not appear to constitute such a domain. Our results indicate that ascidians have previously unrecognized and potentially novel requirements for MRF-directed myogenesis. Moreover, they provide the first example of a domain that is essential to the core function of an important family of gene regulatory proteins, one that, to date, has been found in only a single branch of the family.
Collapse
Affiliation(s)
- Lindsay E Ratcliffe
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Emmanuel K Asiedu
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - C J Pickett
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Megan A Warburton
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Stephanie A Izzi
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| | - Thomas H Meedel
- Department of Biology, Rhode Island College, 600 Mt. Pleasant Ave., Providence, RI 02908, USA.
| |
Collapse
|
25
|
Sanchez de Groot N, Torrent Burgas M, Ravarani CN, Trusina A, Ventura S, Babu MM. The fitness cost and benefit of phase-separated protein deposits. Mol Syst Biol 2019; 15:e8075. [PMID: 30962358 PMCID: PMC6452874 DOI: 10.15252/msb.20178075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Phase separation of soluble proteins into insoluble deposits is associated with numerous diseases. However, protein deposits can also function as membrane-less compartments for many cellular processes. What are the fitness costs and benefits of forming such deposits in different conditions? Using a model protein that phase-separates into deposits, we distinguish and quantify the fitness contribution due to the loss or gain of protein function and deposit formation in yeast. The environmental condition and the cellular demand for the protein function emerge as key determinants of fitness. Protein deposit formation can influence cell-to-cell variation in free protein abundance between individuals of a cell population (i.e., gene expression noise). This results in variable manifestation of protein function and a continuous range of phenotypes in a cell population, favoring survival of some individuals in certain environments. Thus, protein deposit formation by phase separation might be a mechanism to sense protein concentration in cells and to generate phenotypic variability. The selectable phenotypic variability, previously described for prions, could be a general property of proteins that can form phase-separated assemblies and may influence cell fitness.
Collapse
Affiliation(s)
- Natalia Sanchez de Groot
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK .,Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Marc Torrent Burgas
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.,Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Ala Trusina
- Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
| | - Salvador Ventura
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - M Madan Babu
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| |
Collapse
|
26
|
Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 2019; 3:679-690. [PMID: 30858588 DOI: 10.1038/s41559-019-0822-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.
Collapse
|
27
|
Jiang F, Liu Q, Liu X, Wang XH, Kang L. Genomic data reveal high conservation but divergent evolutionary pattern of Polycomb/Trithorax group genes in arthropods. INSECT SCIENCE 2019; 26:20-34. [PMID: 29127737 DOI: 10.1111/1744-7917.12558] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Revised: 11/04/2017] [Accepted: 11/05/2017] [Indexed: 06/07/2023]
Abstract
Epigenetic gene control is maintained by chromatin-associated Polycomb group (PcG) and Trithorax group (TrxG) genes, which act antagonistically via the interplay between PcG and TrxG regulation to generate silenced or permissive transcriptional states. In this study, we searched for PcG/TrxG genes in 180 arthropod genomes, covering all the sequenced arthropod genomes at the time of conducting this study, to perform a global investigation of PcG/TrxG genes in a phylogenetic frame. Results of ancestral state reconstruction analysis revealed that the ancestor of arthropod species has an almost complete repertoire of PcG/TrxG genes, and most of these genes were seldom lost above order level. The domain diversity analysis indicated that the PcG/TrxG genes show variable extent of domain structure changes; some of these changes could be associated with lineage-specific events. The likelihood ratio tests for selection pressure detected a number of PcG/TrxG genes which underwent episodic positive selection on the branch leading to the insects with holometabolous development. These results suggest that, despite their high conservation across arthropod species, different members of PcG/TrxG genes showed considerable differences in domain structure and sequence divergence in arthropod evolution. Our cross species comparisons using large-scale genomic data provide insights into divergent evolutionary pattern on highly conserved genes in arthropods.
Collapse
Affiliation(s)
- Feng Jiang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Qing Liu
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing, China
| | - Xiang Liu
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Xian-Hui Wang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Le Kang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
28
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
29
|
Bitard‐Feildel T, Lamiable A, Mornon J, Callebaut I. Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences. Proteomics 2018; 18:e1800054. [PMID: 30299594 PMCID: PMC7168002 DOI: 10.1002/pmic.201800054] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/29/2018] [Indexed: 12/17/2022]
Abstract
Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.
Collapse
Affiliation(s)
- Tristan Bitard‐Feildel
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB)Institute of Biology Paris‐Seine (IBPS)Centre national de la recherche scientifique (CNRS)Sorbonne Université75005ParisFrance
| | - Alexis Lamiable
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Jean‐Paul Mornon
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Isabelle Callebaut
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| |
Collapse
|
30
|
Dangwal M, Das S. Identification and Analysis of OVATE Family Members from Genome of the Early Land Plants Provide Insights into Evolutionary History of OFP Family and Function. J Mol Evol 2018; 86:511-530. [PMID: 30206666 DOI: 10.1007/s00239-018-9863-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 09/05/2018] [Indexed: 01/11/2023]
Abstract
Mosses, liverworts, hornworts and lycophytes represent transition stages between the aquatic to terrestrial/land plants. Several morphological and adaptive novelties driven by genomic components including emergence and expansion of new or existing gene families have played a critical role during and after the transition, and contributed towards successful colonization of terrestrial ecosystems. It is crucial to decipher the evolutionary transitions and natural selection on the gene structure and function to understand the emergence of phenotypic and adaptive diversity. Plants at the "transition zone", between aquatic and terrestrial ecosystem, are also the most vulnerable because of climate change and may contain clues for successful mitigation of the challenges of climate change. Identification and comparative analyses of such genetic elements and gene families are few in mosses, liverworts, hornworts and lycophytes. Ovate family proteins (OFPs) are plant-specific transcriptional repressors and are acknowledged for their roles in important growth and developmental processes in land plants, and information about the functional aspects of OFPs in early land plants is fragmentary. As a first step towards addressing this gap, a comprehensive in silico analysis was carried out utilizing publicly available genome sequences of Marchantia polymorpha (Mp), Physcomitrella patens (Pp), Selaginella moellendorffii (Sm) and Sphagnum fallax (Sf). Our analysis led to the identification of 4 MpOFPs, 19 PpOFPs, 6 SmOFPs and 3 SfOFPs. Cross-genera analysis revealed a drastic change in the structure and physiochemical properties in OFPs suggesting functional diversification and genomic plasticity during the evolutionary course. Knowledge gained from this comparative analysis will form the framework towards deciphering and dissection of their developmental and adaptive role/s in early land plants and could provide insights into evolutionary strategies adapted by land plants.
Collapse
Affiliation(s)
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
31
|
Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2018; 2:1626-1632. [DOI: 10.1038/s41559-018-0639-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 07/09/2018] [Indexed: 11/08/2022]
|
32
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
33
|
Jakubec D, Kratochvíl M, Vymĕtal J, Vondrášek J. Widespread evolutionary crosstalk among protein domains in the context of multi-domain proteins. PLoS One 2018; 13:e0203085. [PMID: 30169546 PMCID: PMC6118372 DOI: 10.1371/journal.pone.0203085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 08/14/2018] [Indexed: 11/20/2022] Open
Abstract
Domains are distinct units within proteins that typically can fold independently into recognizable three-dimensional structures to facilitate their functions. The structural and functional independence of protein domains is reflected by their apparent modularity in the context of multi-domain proteins. In this work, we examined the coupling of evolution of domain sequences co-occurring within multi-domain proteins to see if it proceeds independently, or in a coordinated manner. We used continuous information theory measures to assess the extent of correlated mutations among domains in multi-domain proteins from organisms across the tree of life. In all multi-domain architectures we examined, domains co-occurring within protein sequences had to some degree undergone concerted evolution. This finding challenges the notion of complete modularity and independence of protein domains, providing new perspective on the evolution of protein sequence and function.
Collapse
Affiliation(s)
- David Jakubec
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University, 128 43 Prague 2, Czech Republic
| | - Miroslav Kratochvíl
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, 118 00 Prague 1, Czech Republic
| | - Jiří Vymĕtal
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| | - Jiří Vondrášek
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| |
Collapse
|
34
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|
35
|
Menichelli C, Gascuel O, Bréhélin L. Improving pairwise comparison of protein sequences with domain co-occurrence. PLoS Comput Biol 2018; 14:e1005889. [PMID: 29293498 PMCID: PMC5766236 DOI: 10.1371/journal.pcbi.1005889] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 01/12/2018] [Accepted: 11/23/2017] [Indexed: 01/17/2023] Open
Abstract
Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence Deciphering the functions of the different proteins of an organism constitutes a first step toward the understanding of its biology. Because they provide strong clues regarding protein functions, domains occupy a key position among the relevant annotations that can be assigned to a protein. Protein domains are sequential motifs that are conserved along evolution and are found in different proteins and in different combinations. One common approach for identifying the domains of a protein is to run sequence-sequence comparisons with local alignment tools as BLAST. However these approaches sometimes miss several hits, especially for species that are phylogenetically distant from reference organisms. We propose here an approach to increase the sensitivity of pairwise sequence comparisons. This approach makes use of the fact that protein domains tend to appear with a limited number of other domains on the same protein (the domain co-occurrence property). On P. falciparum, our approach allows identifying 2240 new domains for which, in most cases, no domain of the Pfam database could be linked.
Collapse
Affiliation(s)
| | - Olivier Gascuel
- IBC, LIRMM, Univ. Montpellier, CNRS, Montpellier, France
- Unité de Bioinformatique Evolutive, C3BI - USR 3756, Institut Pasteur et CNRS, Paris, France
| | - Laurent Bréhélin
- IBC, LIRMM, Univ. Montpellier, CNRS, Montpellier, France
- * E-mail:
| |
Collapse
|
36
|
Abstract
The phenomenon of de novo gene birth from junk DNA is surprising, because random polypeptides are expected to be toxic. There are two conflicting views about how de novo gene birth is nevertheless possible: the continuum hypothesis invokes a gradual gene birth process, while the preadaptation hypothesis predicts that young genes will show extreme levels of gene-like traits. We show that intrinsic structural disorder conforms to the predictions of the preadaptation hypothesis and falsifies the continuum hypothesis, with all genes having higher levels than translated junk DNA, but young genes having the highest level of all. Results are robust to homology detection bias, to the non-independence of multiple members of the same gene family, and to the false positive annotation of protein-coding genes.
Collapse
|
37
|
Craig EA, Marszalek J. How Do J-Proteins Get Hsp70 to Do So Many Different Things? Trends Biochem Sci 2017; 42:355-368. [PMID: 28314505 DOI: 10.1016/j.tibs.2017.02.007] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 02/09/2017] [Accepted: 02/16/2017] [Indexed: 01/07/2023]
Abstract
Hsp70 chaperone machineries have pivotal roles in an array of fundamental biological processes through their facilitation of protein folding, disaggregation, and remodeling. The obligate J-protein co-chaperones of Hsp70s drive much of this remarkable multifunctionality, with most Hsp70s having multiple J-protein partners. Recent data suggest that J-protein-driven versatility is substantially due to precise localization within the cell and the specificity of substrate protein binding. However, this relatively simple view belies the intricacy of J-protein function. Examples are emerging of J-protein interactions with Hsp70s and other chaperones, as well as integration into broader cellular networks. These interactions fine-tune, in critical ways, the ability of Hsp70s to participate in diverse cellular processes.
Collapse
Affiliation(s)
- Elizabeth A Craig
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA.
| | - Jaroslaw Marszalek
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA; Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland.
| |
Collapse
|
38
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
39
|
Schmitz JF, Bornberg-Bauer E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res 2017; 6:57. [PMID: 28163910 PMCID: PMC5247788 DOI: 10.12688/f1000research.10079.1] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2017] [Indexed: 12/31/2022] Open
Abstract
Over the last few years, there has been an increasing amount of evidence for the
de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of
de novo protein-coding genes such as the identification and verification of
de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a
de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide
de novo genes and almost no knowledge about protein structures and how they come about during the emergence of
de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of
de novo protein-coding genes as well as the detailed study of the emergence of functional
de novo protein-coding genes.
Collapse
Affiliation(s)
- Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| |
Collapse
|
40
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
41
|
Abstract
The evolution of natural modular proteins and domain swapping by protein engineers have shown the disruptive potential of non-homologous recombination to create proteins with novel functions or traits. Bacteriophage endolysins, cellulosomes and polyketide synthases are 3 examples of natural modular proteins with each module having a dedicated function. These modular architectures have been created by extensive duplication, shuffling of domains and insertion/deletion of new domains. Protein engineers mimic these natural processes in vitro to create chimeras with altered properties or novel functions by swapping modules between different parental genes. Most domain swapping efforts are realized with traditional restriction and ligation techniques, which become particularly restrictive when either a large number of variants, or variants of proteins with multiple domains have to be constructed. Recent advances in homology-independent shuffling techniques increasingly address this need, but to realize the full potential of the synthetic biology of modular proteins a complete homology-independent method for both rational and random shuffling of modules from an unlimited number of parental genes is still needed.
Collapse
Affiliation(s)
- Veerle E T Maervoet
- a Laboratory of Applied Biotechnology, Department of Applied Biosciences , Ghent University , Ghent , Belgium
| | - Yves Briers
- a Laboratory of Applied Biotechnology, Department of Applied Biosciences , Ghent University , Ghent , Belgium
| |
Collapse
|
42
|
Abstract
Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%-30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding 'dark genomic matter'. These 'de novo' proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which 'de novo' proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a 'grow slow and moult' model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.
Collapse
|
43
|
Creating functional sophistication from simple protein building blocks, exemplified by factor H and the regulators of complement activation. Biochem Soc Trans 2016; 43:812-8. [PMID: 26517887 DOI: 10.1042/bst20150074] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Complement control protein modules (CCPs) occur in numerous functionally diverse extracellular proteins. Also known as short consensus repeats (SCRs) or sushi domains each CCP contains approximately 60 amino acid residues, including four consensus cysteines participating in two disulfide bonds. Varying in length and sequence, CCPs adopt a β-sandwich type fold and have an overall prolate spheroidal shape with N- and C-termini lying close to opposite poles of the long axis. CCP-containing proteins are important as cytokine receptors and in neurotransmission, cell adhesion, blood clotting, extracellular matrix formation, haemoglobin metabolism and development, but CCPs are particularly well represented in the vertebrate complement system. For example, factor H (FH), a key soluble regulator of the alternative pathway of complement activation, is made up entirely from a chain of 20 CCPs joined by short linkers. Collectively, therefore, the 20 CCPs of FH must mediate all its functional capabilities. This is achieved via collaboration and division of labour among these modules. Structural studies have illuminated the dynamic architectures that allow FH and other CCP-rich proteins to perform their biological functions. These are largely the products of a highly varied set of intramolecular interactions between CCPs. The CCP can act as building block, spacer, highly versatile recognition site or dimerization mediator. Tandem CCPs may form composite binding sites or contribute to flexible, rigid or conformationally 'switchable' segments of the parent proteins.
Collapse
|
44
|
Klasberg S, Bitard-Feildel T, Mallet L. Computational Identification of Novel Genes: Current and Future Perspectives. Bioinform Biol Insights 2016; 10:121-31. [PMID: 27493475 PMCID: PMC4970615 DOI: 10.4137/bbi.s39950] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 05/31/2016] [Accepted: 06/05/2016] [Indexed: 12/31/2022] Open
Abstract
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Ludovic Mallet
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
45
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
46
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|
47
|
Cromar GL, Zhao A, Xiong X, Swapna LS, Loughran N, Song H, Parkinson J. PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw013. [PMID: 26980519 PMCID: PMC4792532 DOI: 10.1093/database/baw013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/29/2016] [Indexed: 11/13/2022]
Abstract
PhyloPro is a database and accompanying web-based application for the construction and exploration of phylogenetic profiles across the Eukarya. In this update article, we present six major new developments in PhyloPro: (i) integration of Pfam-A domain predictions for all proteins; (ii) new summary heatmaps and detailed level views of domain conservation; (iii) an interactive, network-based visualization tool for exploration of domain architectures and their conservation; (iv) ability to browse based on protein functional categories (GOSlim); (v) improvements to the web interface to enhance drill down capability from the heatmap view; and (vi) improved coverage including 164 eukaryotes and 12 reference species. In addition, we provide improved support for downloading data and images in a variety of formats. Among the existing tools available for phylogenetic profiles, PhyloPro provides several innovative domain-based features including a novel domain adjacency visualization tool. These are designed to allow the user to identify and compare proteins with similar domain architectures across species and thus develop hypotheses about the evolution of lineage-specific trajectories. Database URL: http://www.compsysbio.org/phylopro/.
Collapse
Affiliation(s)
- Graham L Cromar
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - Anthony Zhao
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - Xuejian Xiong
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - Lakshmipuram S Swapna
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - Noeleen Loughran
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - Hongyan Song
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and
| | - John Parkinson
- Program in Molecular Structure and Function, Hospital for Sick Children, 21-9830 PGCRL, 686 Bay Street, Toronto, ON M5G 0A4, Canada and Departments of Biochemistry, Computer Science and Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
48
|
Abstract
Comparative genomics have brought much insight into the de novo emergence of genes. Two new studies in Drosophila explore the dynamics of gene gain and loss at the population and species levels, extending our view on the life cycle of genes.
Collapse
Affiliation(s)
- Rafik Neme
- Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany.
| |
Collapse
|
49
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
50
|
Kersting AR, Mizrachi E, Bornberg-Bauer E, Myburg AA. Protein domain evolution is associated with reproductive diversification and adaptive radiation in the genus Eucalyptus. THE NEW PHYTOLOGIST 2015; 206:1328-36. [PMID: 25494981 DOI: 10.1111/nph.13211] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 11/04/2014] [Indexed: 05/04/2023]
Abstract
Eucalyptus is a pivotal genus within the rosid order Myrtales with distinct geographic history and adaptations. Comparative analysis of protein domain evolution in the newly sequenced Eucalyptus grandis genome and other rosid lineages sheds light on the adaptive mechanisms integral to the success of this genus of woody perennials. We reconstructed the ancestral domain content to elucidate the gain, loss and expansion of protein domains and domain arrangements in Eucalyptus in the context of rosid phylogeny. We used functional gene ontology (GO) annotation of genes to investigate the possible biological and evolutionary consequences of protein domain expansion. We found that protein modulation within the angiosperms occurred primarily on the level of expansion of certain domains and arrangements. Using RNA-Seq data from E. grandis, we showed that domain expansions have contributed to tissue-specific expression of tandemly duplicated genes. Our results indicate that tandem duplication of genes, a key feature of the Eucalyptus genome, has played an important role in the expansion of domains, particularly in proteins related to the specialization of reproduction and biotic and abiotic interactions affecting root and floral biology, and that tissue-specific expression of proteins with expanded domains has facilitated subfunctionalization in domain families.
Collapse
Affiliation(s)
- Anna R Kersting
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Bioinformatics Group, Institute for Computer Science, Heinrich-Heine-University, Duesseldorf, Germany
| | - Eshchar Mizrachi
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag X20, Pretoria, 0028, South Africa
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alexander A Myburg
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag X20, Pretoria, 0028, South Africa
| |
Collapse
|