1
|
Konno N, Maeno S, Tanizawa Y, Arita M, Endo A, Iwasaki W. Evolutionary paths toward multi-level convergence of lactic acid bacteria in fructose-rich environments. Commun Biol 2024; 7:902. [PMID: 39048718 PMCID: PMC11269746 DOI: 10.1038/s42003-024-06580-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 07/11/2024] [Indexed: 07/27/2024] Open
Abstract
Convergence provides clues to unveil the non-random nature of evolution. Intermediate paths toward convergence inform us of the stochasticity and the constraint of evolutionary processes. Although previous studies have suggested that substantial constraints exist in microevolutionary paths, it remains unclear whether macroevolutionary convergence follows stochastic or constrained paths. Here, we performed comparative genomics for hundreds of lactic acid bacteria (LAB) species, including clades showing a convergent gene repertoire and sharing fructose-rich habitats. By adopting phylogenetic comparative methods we showed that the genomic convergence of distinct fructophilic LAB (FLAB) lineages was caused by parallel losses of more than a hundred orthologs and the gene losses followed significantly similar orders. Our results further suggested that the loss of adhE, a key gene for phenotypic convergence to FLAB, follows a specific evolutionary path of domain architecture decay and amino acid substitutions in multiple LAB lineages sharing fructose-rich habitats. These findings unveiled the constrained evolutionary paths toward the convergence of free-living bacterial clades at the genomic and molecular levels.
Collapse
Affiliation(s)
- Naoki Konno
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| | - Shintaro Maeno
- Research Center for Advance Science and Innovation Organization for Research Initiatives, Yamaguchi University, Yamaguchi, Yamaguchi, Japan
| | - Yasuhiro Tanizawa
- Department of Informatics, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Masanori Arita
- Department of Informatics, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Akihito Endo
- Department of Nutritional Science and Food Safety, Faculty of Applied Bioscience, Tokyo University of Agriculture, Tokyo, Japan
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
- Atmosphere and Ocean Research Institute, The University of Tokyo, Kashiwa, Chiba, Japan.
- Institute for Quantitative Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| |
Collapse
|
2
|
Suppiyar V, Bonthala VS, Shrestha A, Krey S, Stich B. Genome-wide identification and expression analysis of the SET domain-containing gene family in potato (Solanum tuberosum L.). BMC Genomics 2024; 25:442. [PMID: 38702658 PMCID: PMC11069243 DOI: 10.1186/s12864-024-10367-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open
Abstract
Genes containing the SET domain can catalyse histone lysine methylation, which in turn has the potential to cause changes to chromatin structure and regulation of the transcription of genes involved in diverse physiological and developmental processes. However, the functions of SET domain-containing (StSET) genes in potato still need to be studied. The objectives of our study can be summarized as in silico analysis to (i) identify StSET genes in the potato genome, (ii) systematically analyse gene structure, chromosomal distribution, gene duplication events, promoter sequences, and protein domains, (iii) perform phylogenetic analyses, (iv) compare the SET domain-containing genes of potato with other plant species with respect to protein domains and orthologous relationships, (v) analyse tissue-specific expression, and (vi) study the expression of StSET genes in response to drought and heat stresses. In this study, we identified 57 StSET genes in the potato genome, and the genes were physically mapped onto eleven chromosomes. The phylogenetic analysis grouped these StSET genes into six clades. We found that tandem duplication through sub-functionalisation has contributed only marginally to the expansion of the StSET gene family. The protein domain TDBD (PFAM ID: PF16135) was detected in StSET genes of potato while it was absent in all other previously studied species. This study described three pollen-specific StSET genes in the potato genome. Expression analysis of four StSET genes under heat and drought in three potato clones revealed that these genes might have non-overlapping roles under different abiotic stress conditions and durations. The present study provides a comprehensive analysis of StSET genes in potatoes, and it serves as a basis for further functional characterisation of StSET genes towards understanding their underpinning biological mechanisms in conferring stress tolerance.
Collapse
Affiliation(s)
- Vithusan Suppiyar
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany
| | - Venkata Suresh Bonthala
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany.
- Present Address: Julius Kühn-Institut (JKI), Institute for Breeding Research On Agricultural Crops, Rudolf-Schick-Platz 3a, OT Groß Lüsewitz, Sanitz, 18190, Germany.
| | - Asis Shrestha
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany
- Present Address: Julius Kühn-Institut (JKI), Institute for Breeding Research On Agricultural Crops, Rudolf-Schick-Platz 3a, OT Groß Lüsewitz, Sanitz, 18190, Germany
| | - Stephanie Krey
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany
- Present Address: Julius Kühn-Institut (JKI), Institute for Breeding Research On Agricultural Crops, Rudolf-Schick-Platz 3a, OT Groß Lüsewitz, Sanitz, 18190, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany
- Cluster of Excellence On Plant Sciences, From Complex Traits Towards Synthetic Modules, Heinrich Heine University, Düsseldorf, 40225, Germany
- Present Address: Julius Kühn-Institut (JKI), Institute for Breeding Research On Agricultural Crops, Rudolf-Schick-Platz 3a, OT Groß Lüsewitz, Sanitz, 18190, Germany
| |
Collapse
|
3
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
4
|
Vignesh R, Aradhyam GK. Calnuc-derived nesfatin-1-like peptide is an activator of tumor cell proliferation and migration. FEBS Lett 2023; 597:2288-2300. [PMID: 37539786 DOI: 10.1002/1873-3468.14712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 06/28/2023] [Accepted: 07/11/2023] [Indexed: 08/05/2023]
Abstract
Calnuc (nucleobindin-1, nucb1) is a Ca2+ -binding protein involved in the etiology of many human diseases. To understand the functions of calnuc, we have identified a nesfatin-1-like peptide (NLP) in its N terminus that is proteolyzed by a convertase enzyme in the secretory granules of cells. Mutational studies confirm the presence of a proteolytic cleavage site for proprotein convertase subtilisin/kexin type 1 (PCSK1). We demonstrate that NLP regulates Gαq-mediated intracellular Ca2+ dynamics, likely via a G-protein-coupled receptor. NLP treatment to carcinoma cell lines (SCC131 cells) promotes the expression of regulators of cell cycle, proliferation, and clonogenicity by the AKT/mTOR pathway. NLP is causative of augmented migration and epithelial-mesenchymal transition (EMT), illustrating its metastatic propensity and establishing its tumor promotion ability.
Collapse
Affiliation(s)
- Ravichandran Vignesh
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Gopala Krishna Aradhyam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| |
Collapse
|
5
|
Pichard-Kostuch A, Da Cunha V, Oberto J, Sauguet L, Basta T. The universal Sua5/TsaC family evolved different mechanisms for the synthesis of a key tRNA modification. Front Microbiol 2023; 14:1204045. [PMID: 37415821 PMCID: PMC10321239 DOI: 10.3389/fmicb.2023.1204045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/02/2023] [Indexed: 07/08/2023] Open
Abstract
TsaC/Sua5 family of enzymes catalyzes the first step in the synthesis of N6-threonyl-carbamoyl adenosine (t6A) one of few truly ubiquitous tRNA modifications important for translation accuracy. TsaC is a single domain protein while Sua5 proteins contains a TsaC-like domain and an additional SUA5 domain of unknown function. The emergence of these two proteins and their respective mechanisms for t6A synthesis remain poorly understood. Here, we performed phylogenetic and comparative sequence and structure analysis of TsaC and Sua5 proteins. We confirm that this family is ubiquitous but the co-occurrence of both variants in the same organism is rare and unstable. We further find that obligate symbionts are the only organisms lacking sua5 or tsaC genes. The data suggest that Sua5 was the ancestral version of the enzyme while TsaC arose via loss of the SUA5 domain that occurred multiple times in course of evolution. Multiple losses of one of the two variants in combination with horizontal gene transfers along a large range of phylogenetic distances explains the present day patchy distribution of Sua5 and TsaC. The loss of the SUA5 domain triggered adaptive mutations affecting the substrate binding in TsaC proteins. Finally, we identified atypical Sua5 proteins in Archaeoglobi archaea that seem to be in the process of losing the SUA5 domain through progressive gene erosion. Together, our study uncovers the evolutionary path for emergence of these homologous isofunctional enzymes and lays the groundwork for future experimental studies on the function of TsaC/Sua5 proteins in maintaining faithful translation.
Collapse
Affiliation(s)
- Adeline Pichard-Kostuch
- CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, France
| | - Violette Da Cunha
- CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, France
| | - Jacques Oberto
- CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, France
| | - Ludovic Sauguet
- Architecture and Dynamics of Biological Macromolecules, Institut Pasteur, Université Paris Cité, CNRS, UMR 3528, Paris, France
| | - Tamara Basta
- CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
6
|
Niu F, Liu M, Dong S, Dong X, Wang Y, Cheng C, Chu H, Hu Z, Ma F, Yan P, Lan D, Zhang J, Zhou J, Sun B, Zhang A, Hu J, Zhang X, He S, Cui J, Yuan X, Yang J, Cao L, Luo X. RNA-Seq Transcriptome Analysis and Evolution of OsEBS, a Gene Involved in Enhanced Spikelet Number per Panicle in Rice. Int J Mol Sci 2023; 24:10303. [PMID: 37373450 DOI: 10.3390/ijms241210303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/08/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
Spikelet number per panicle (SNP) is one of the most important yield components in rice. Rice ENHANCING BIOMASS AND SPIKELET NUMBER (OsEBS), a gene involved in improved SNP and yield, has been cloned from an accession of Dongxiang wild rice. However, the mechanism of OsEBS increasing rice SNP is poorly understood. In this study, the RNA-Seq technology was used to analyze the transcriptome of wildtype Guichao 2 and OsEBS over-expression line B102 at the heading stage, and analysis of the evolution of OsEBS was also conducted. A total of 5369 differentially expressed genes (DEGs) were identified between Guichao2 and B102, most of which were down-regulated in B102. Analysis of the expression of endogenous hormone-related genes revealed that 63 auxin-related genes were significantly down-regulated in B102. Gene Ontogeny (GO) enrichment analysis showed that the 63 DEGs were mainly enriched in eight GO terms, including auxin-activated signaling pathway, auxin polar transport, auxin transport, basipetal auxin transport, and amino acid transmembrane transport, most of which were directly or indirectly related to polar auxin transport. Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway analysis further verified that the down-regulated genes related to polar auxin transport had important effects on increased SNP. Analysis of the evolution of OsEBS found that OsEBS was involved in the differentiation of indica and japonica, and the differentiation of OsEBS supported the multi-origin model of rice domestication. Indica (XI) subspecies harbored higher nucleotide diversity than japonica (GJ) subspecies in the OsEBS region, and XI experienced strong balancing selection during evolution, while selection in GJ was neutral. The degree of genetic differentiation between GJ and Bas subspecies was the smallest, while it was the highest between GJ and Aus. Phylogenetic analysis of the Hsp70 family in O. sativa, Brachypodium distachyon, and Arabidopsis thaliana indicated that changes in the sequences of OsEBS were accelerated during evolution. Accelerated evolution and domain loss in OsEBS resulted in neofunctionalization. The results obtained from this study provide an important theoretical basis for high-yield rice breeding.
Collapse
Affiliation(s)
- Fuan Niu
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Mingyu Liu
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Shiqing Dong
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xianxin Dong
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Ying Wang
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Can Cheng
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Huangwei Chu
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Zejun Hu
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Fuying Ma
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Peiwen Yan
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Dengyong Lan
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jianming Zhang
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Jihua Zhou
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Bin Sun
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Anpeng Zhang
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Jian Hu
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xinwei Zhang
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Shicong He
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jinhao Cui
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xinyu Yuan
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jinshui Yang
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Liming Cao
- Key Laboratory of Germplasm Innovation and Genetic Improvement of Grain and Oil Crops (Co-Construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
| | - Xiaojin Luo
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences, Fudan University, Shanghai 200438, China
| |
Collapse
|
7
|
Murugesan SN, Monteiro A. Evolution of modular and pleiotropic enhancers. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:105-115. [PMID: 35334158 DOI: 10.1002/jez.b.23131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 11/05/2022]
Abstract
Cis-regulatory elements (CREs), or enhancers, are segments of noncoding DNA that regulate the spatial and temporal expression of nearby genes. Sometimes, genes are expressed in more than one tissue, and this can be driven by two main types of CREs: tissue-specific "modular" CREs, where different CREs drive expression of the gene in the different tissues, or by "pleiotropic" CREs, where the same CRE drives expression in the different tissues. In this perspective, we will discuss some of the ways (i) modular and pleiotropic CREs might originate; (ii) propose that modular CREs might derive from pleiotropic CREs via a process of duplication, degeneration, and complementation (the CRE-DDC model); and (iii) propose that hotspot loci of evolution are associated with the origin of modular CREs belonging to any gene in a regulatory network.
Collapse
Affiliation(s)
- Suriya N Murugesan
- Department of Biological Sciences, National University of Singapore, Singapore
| | - Antónia Monteiro
- Department of Biological Sciences, National University of Singapore, Singapore.,Division of Science, Yale-NUS College, Singapore
| |
Collapse
|
8
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
9
|
Cui X, Xue Y, McCormack C, Garces A, Rachman TW, Yi Y, Stolzer M, Durand D. Simulating domain architecture evolution. Bioinformatics 2022; 38:i134-i142. [PMID: 35758772 PMCID: PMC9236583 DOI: 10.1093/bioinformatics/btac242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Simulation is an essential technique for generating biomolecular data with a ‘known’ history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation. Results Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis–Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation. Availability and implementation DomArchov is written in Python 3 and is available at http://www.cs.cmu.edu/~durand/DomArchov. The data underlying this article are available via the same link. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyue Cui
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yifan Xue
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Collin McCormack
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alejandro Garces
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Thomas W Rachman
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yang Yi
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
10
|
Martyn JE, Gomez-Valero L, Buchrieser C. The evolution and role of eukaryotic-like domains in environmental intracellular bacteria: the battle with a eukaryotic cell. FEMS Microbiol Rev 2022; 46:6529235. [DOI: 10.1093/femsre/fuac012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 02/09/2022] [Accepted: 02/14/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Intracellular pathogens that are able to thrive in different environments, such as Legionella spp. which preferentially live in protozoa in aquatic environments or environmental Chlamydiae which replicate either within protozoa or a range of animals, possess a plethora of cellular biology tools to influence their eukaryotic host. The host manipulation tools that evolved in the interaction with protozoa, confer these bacteria the capacity to also infect phylogenetically distinct eukaryotic cells, such as macrophages and thus they can also be human pathogens. To manipulate the host cell, bacteria use protein secretion systems and molecular effectors. Although these molecular effectors are encoded in bacteria, they are expressed and function in a eukaryotic context often mimicking or inhibiting eukaryotic proteins. Indeed, many of these effectors have eukaryotic-like domains. In this review we propose that the main pathways environmental intracellular bacteria need to subvert in order to establish the host eukaryotic cell as a replication niche are chromatin remodelling, ubiquitination signalling, and modulation of protein-protein interactions via tandem repeat domains. We then provide mechanistic insight into how these proteins might have evolved as molecular weapons. Finally, we highlight that in environmental intracellular bacteria the number of eukaryotic-like domains and proteins is considerably higher than in intracellular bacteria specialised to an isolated niche, such as obligate intracellular human pathogens. As mimics of eukaryotic proteins are critical components of host pathogen interactions, this distribution of eukaryotic-like domains suggests that the environment has selected them.
Collapse
Affiliation(s)
- Jessica E Martyn
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| | - Laura Gomez-Valero
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| | - Carmen Buchrieser
- Institut Pasteur, Biologie des Bactéries Intracellulaires and CNRS UMR 3525, Paris, France
| |
Collapse
|
11
|
Rivera AM, Swanson WJ. The Importance of Gene Duplication and Domain Repeat Expansion for the Function and Evolution of Fertilization Proteins. Front Cell Dev Biol 2022; 10:827454. [PMID: 35155436 PMCID: PMC8830517 DOI: 10.3389/fcell.2022.827454] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
The process of gene duplication followed by gene loss or evolution of new functions has been studied extensively, yet the role gene duplication plays in the function and evolution of fertilization proteins is underappreciated. Gene duplication is observed in many fertilization protein families including Izumo, DCST, ZP, and the TFP superfamily. Molecules mediating fertilization are part of larger gene families expressed in a variety of tissues, but gene duplication followed by structural modifications has often facilitated their cooption into a fertilization function. Repeat expansions of functional domains within a gene also provide opportunities for the evolution of novel fertilization protein. ZP proteins with domain repeat expansions are linked to species-specificity in fertilization and TFP proteins that experienced domain duplications were coopted into a novel sperm function. This review outlines the importance of gene duplications and repeat domain expansions in the evolution of fertilization proteins.
Collapse
|
12
|
Coyote-Maestas W, Nedrud D, Suma A, He Y, Matreyek KA, Fowler DM, Carnevale V, Myers CL, Schmidt D. Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling. Nat Commun 2021; 12:7114. [PMID: 34880224 PMCID: PMC8654947 DOI: 10.1038/s41467-021-27342-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - David Nedrud
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Antonio Suma
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Yungui He
- grid.17635.360000000419368657Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455 USA
| | - Kenneth A. Matreyek
- grid.67105.350000 0001 2164 3847Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Douglas M. Fowler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98115 USA ,grid.34477.330000000122986657Department of Bioengineering, University of Washington, Seattle, WA 98115 USA
| | - Vincenzo Carnevale
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Chad L. Myers
- grid.17635.360000000419368657Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
13
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
14
|
Shams A, Higgins SA, Fellmann C, Laughlin TG, Oakes BL, Lew R, Kim S, Lukarska M, Arnold M, Staahl BT, Doudna JA, Savage DF. Comprehensive deletion landscape of CRISPR-Cas9 identifies minimal RNA-guided DNA-binding modules. Nat Commun 2021; 12:5664. [PMID: 34580310 PMCID: PMC8476515 DOI: 10.1038/s41467-021-25992-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 09/10/2021] [Indexed: 11/28/2022] Open
Abstract
Proteins evolve through the modular rearrangement of elements known as domains. Extant, multidomain proteins are hypothesized to be the result of domain accretion, but there has been limited experimental validation of this idea. Here, we introduce a technique for genetic minimization by iterative size-exclusion and recombination (MISER) for comprehensively making all possible deletions of a protein. Using MISER, we generate a deletion landscape for the CRISPR protein Cas9. We find that the catalytically-dead Streptococcus pyogenes Cas9 can tolerate large single deletions in the REC2, REC3, HNH, and RuvC domains, while still functioning in vitro and in vivo, and that these deletions can be stacked together to engineer minimal, DNA-binding effector proteins. In total, our results demonstrate that extant proteins retain significant modularity from the accretion process and, as genetic size is a major limitation for viral delivery systems, establish a general technique to improve genome editing and gene therapy-based therapeutics.
Collapse
Affiliation(s)
- Arik Shams
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Sean A Higgins
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Christof Fellmann
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Gladstone Institutes, San Francisco, CA, 94158, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Thomas G Laughlin
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Division of Biological Sciences, University of California, San Diego, San Diego, CA, 92093, USA
| | - Benjamin L Oakes
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Rachel Lew
- Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Shin Kim
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Maria Lukarska
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Madeline Arnold
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Brett T Staahl
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Jennifer A Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Gladstone Institutes, San Francisco, CA, 94158, USA
- Graduate Group in Biophysics, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - David F Savage
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA.
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
15
|
Yang Z, Liu M, Wang B, Wang B. Classification of protein domains based on their three-dimensional shapes (CPD3DS). Synth Syst Biotechnol 2021; 6:224-230. [PMID: 34541344 PMCID: PMC8429105 DOI: 10.1016/j.synbio.2021.08.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 08/23/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Protein design has become a powerful method to expand the number of natural proteins and design customized proteins according to demands. Domain-based protein design spares the need to create novel elements from scratch, which makes it a more efficient strategy than scratch-based protein design in designing multi-domain proteins, protein complexes and biomaterials. As the surface shape plays a central role in domain-domain and protein-protein interactions, a global map of the surface shapes of all domains should be very beneficial for domain-based protein design. Therefore, in this study, we characterized the surface shapes of protein domains, collected from CATH and SCOP databases, with their 3D-Zernike descriptors (3DZDs). Then similarities of domain shape features were identified, and all domains were classified accordingly. The preferences of the combinations of domains between different clusters were analyzed in natural proteins from the Protein Data Bank. A user-friendly website, termed CPD3DS, was also developed for storage, retrieval, analyses and visualization of our results. This work not only provides an overall view of protein domain shapes by showing their variety and similarities, but also opens up a new avenue to understand the properties of protein structural domains, and design principles of protein architectures.
Collapse
Affiliation(s)
- Zhaochang Yang
- School of Life Science and Technology, University of Electronic Science and Technology of China, China
| | - Mingkang Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, China
| | - Bin Wang
- School of Information and Software Engineering, University of Electronic Science and Technology of China, China
| | - Beibei Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, China.,Centre for Informational Biology, University of Electronic Science and Technology of China, 2006 Xiyuan Road, Chengdu, Sichuan, 611731, China
| |
Collapse
|
16
|
Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C. The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol 2020; 20:30. [PMID: 32059645 PMCID: PMC7023805 DOI: 10.1186/s12862-020-1591-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 01/31/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Modularity is important for evolutionary innovation. The recombination of existing units to form larger complexes with new functionalities spares the need to create novel elements from scratch. In proteins, this principle can be observed at the level of protein domains, functional subunits which are regularly rearranged to acquire new functions. RESULTS In this study we analyse the mechanisms leading to new domain arrangements in five major eukaryotic clades (vertebrates, insects, fungi, monocots and eudicots) at unprecedented depth and breadth. This allows, for the first time, to directly compare rates of rearrangements between different clades and identify both lineage specific and general patterns of evolution in the context of domain rearrangements. We analyse arrangement changes along phylogenetic trees by reconstructing ancestral domain content in combination with feasible single step events, such as fusion or fission. Using this approach we explain up to 70% of all rearrangements by tracing them back to their precursors. We find that rates in general and the ratio between these rates for a given clade in particular, are highly consistent across all clades. In agreement with previous studies, fusions are the most frequent event leading to new domain arrangements. A lineage specific pattern in fungi reveals exceptionally high loss rates compared to other clades, supporting recent studies highlighting the importance of loss for evolutionary innovation. Furthermore, our methodology allows us to link domain emergences at specific nodes in the phylogenetic tree to important functional developments, such as the origin of hair in mammals. CONCLUSIONS Our results demonstrate that domain rearrangements are based on a canonical set of mutational events with rates which lie within a relatively narrow and consistent range. In addition, gained knowledge about these rates provides a basis for advanced domain-based methodologies for phylogenetics and homology analysis which complement current sequence-based methods.
Collapse
Affiliation(s)
- Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.,Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Steffen Klasberg
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Sören Perrey
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.
| |
Collapse
|
17
|
Bauer TL, Buchholz PCF, Pleiss J. The modular structure of α/β-hydrolases. FEBS J 2019; 287:1035-1053. [PMID: 31545554 DOI: 10.1111/febs.15071] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/15/2019] [Accepted: 09/19/2019] [Indexed: 12/22/2022]
Abstract
The α/β-hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence-structure-function relationships, the Lipase Engineering Database (https://led.biocatnet.de) was updated. Overall, 280 638 protein sequences and 1557 protein structures were analysed. All α/β-hydrolases consist of the catalytically active core domain, but they might also contain additional structural modules, resulting in 12 different architectures: core domain only, additional lids at three different positions, three different caps, additional N- or C-terminal domains and combinations of N- and C-terminal domains with caps and lids respectively. In addition, the α/β-hydrolases were distinguished by their oxyanion hole signature (GX-, GGGX- and Y-types). The N-terminal domains show two different folds, the Rossmann fold or the β-propeller fold. The C-terminal domains show a β-sandwich fold. The N-terminal β-propeller domain and the C-terminal β-sandwich domain are structurally similar to carbohydrate-binding proteins such as lectins. The classification was applied to the newly discovered polyethylene terephthalate (PET)-degrading PETases and MHETases, which are core domain α/β-hydrolases of the GX- and the GGGX-type respectively. To investigate evolutionary relationships, sequence networks were analysed. The degree distribution followed a power law with a scaling exponent γ = 1.4, indicating a highly inhomogeneous network which consists of a few hubs and a large number of less connected sequences. The hub sequences have many functional neighbours and therefore are expected to be robust toward possible deleterious effects of mutations. The cluster size distribution followed a power law with an extrapolated scaling exponent τ = 2.6, which strongly supports the connectedness of the sequence space of α/β-hydrolases. DATABASE: Supporting data about domains from other proteins with structural similarity to the N- or C-terminal domains of α/β-hydrolases are available in Data Repository of the University of Stuttgart (DaRUS) under doi: https://doi.org/10.18419/darus-458.
Collapse
Affiliation(s)
- Tabea L Bauer
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| |
Collapse
|
18
|
Kleppe AS, Bornberg-Bauer E. Robustness by intrinsically disordered C-termini and translational readthrough. Nucleic Acids Res 2019; 46:10184-10194. [PMID: 30247639 PMCID: PMC6365619 DOI: 10.1093/nar/gky778] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 09/20/2018] [Indexed: 12/20/2022] Open
Abstract
During protein synthesis genetic instructions are passed from DNA via mRNA to the ribosome to assemble a protein chain. Occasionally, stop codons in the mRNA are bypassed and translation continues into the untranslated region (3′-UTR). This process, called translational readthrough (TR), yields a protein chain that becomes longer than would be predicted from the DNA sequence alone. Protein sequences vary in propensity for translational errors, which may yield evolutionary constraints by limiting evolutionary paths. Here we investigated TR in Saccharomyces cerevisiae by analysing ribosome profiling data. We clustered proteins as either prone or non-prone to TR, and conducted comparative analyses. We find that a relatively high frequency (5%) of genes undergo TR, including ribosomal subunit proteins. Our main finding is that proteins undergoing TR are highly expressed and have a higher proportion of intrinsically disordered C-termini. We suggest that highly expressed proteins may compensate for the deleterious effects of TR by having intrinsically disordered C-termini, which may provide conformational flexibility but without distorting native function. Moreover, we discuss whether minimizing deleterious effects of TR is also enabling exploration of the phenotypic landscape of protein isoforms.
Collapse
Affiliation(s)
- April Snofrid Kleppe
- Institute of Biodiversity and Evolution, University of Münster, Hüfferstr. 1, 48151 Münster, Germany
| | - Erich Bornberg-Bauer
- Institute of Biodiversity and Evolution, University of Münster, Hüfferstr. 1, 48151 Münster, Germany
| |
Collapse
|
19
|
Baral K, Rotwein P. The insulin-like growth factor 2 gene in mammals: Organizational complexity within a conserved locus. PLoS One 2019; 14:e0219155. [PMID: 31251794 PMCID: PMC6599137 DOI: 10.1371/journal.pone.0219155] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 06/17/2019] [Indexed: 01/10/2023] Open
Abstract
The secreted protein, insulin-like growth factor 2 (IGF2), plays a central role in fetal and prenatal growth and development, and is regulated at the genetic level by parental imprinting, being expressed predominantly from the paternally derived chromosome in mice and humans. Here, IGF2/Igf2 and its locus has been examined in 19 mammals from 13 orders spanning ~166 million years of evolutionary development. By using human or mouse DNA segments as queries in genome analyses, and by assessing gene expression using RNA-sequencing libraries, more complexity was identified within IGF2/Igf2 than was annotated previously. Multiple potential 5’ non-coding exons were mapped in most mammals and are presumably linked to distinct IGF2/Igf2 promoters, as shown for several species by interrogating RNA-sequencing libraries. DNA similarity was highest in IGF2/Igf2 coding exons; yet, even though the mature IGF2 protein was conserved, versions of 67 or 70 residues are produced secondary to species-specific maintenance of alternative RNA splicing at a variable intron-exon junction. Adjacent H19 was more divergent than IGF2/Igf2, as expected in a gene for a noncoding RNA, and was identified in only 10/19 species. These results show that common features, including those defining IGF2/Igf2 coding and several non-coding exons, were likely present at the onset of the mammalian radiation, but that others, such as a putative imprinting control region 5’ to H19 and potential enhancer elements 3’ to H19, diversified with speciation. This study also demonstrates that careful analysis of genomic and gene expression repositories can provide new insights into gene structure and regulation.
Collapse
Affiliation(s)
- Kabita Baral
- Graduate School, College of Science, University of Texas at El Paso, El Paso, Texas
| | - Peter Rotwein
- Department of Molecular and Translational Medicine, Paul L. Foster School of Medicine, Texas Tech Health University Health Sciences Center, El Paso, Texas
- * E-mail:
| |
Collapse
|
20
|
Rodrigues JV, Ogbunugafor CB, Hartl DL, Shakhnovich EI. Chimeric dihydrofolate reductases display properties of modularity and biophysical diversity. Protein Sci 2019; 28:1359-1367. [PMID: 31095809 DOI: 10.1002/pro.3646] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 05/13/2019] [Indexed: 01/12/2023]
Abstract
While reverse genetics and functional genomics have long affirmed the role of individual mutations in determining protein function, there have been fewer studies addressing how large-scale changes in protein sequences, such as in entire modular segments, influence protein function and evolution. Given how recombination can reassort protein sequences, these types of changes may play an underappreciated role in how novel protein functions evolve in nature. Such studies could aid our understanding of whether certain organismal phenotypes related to protein function-such as growth in the presence or absence of an antibiotic-are robust with respect to the identity of certain modular segments. In this study, we combine molecular genetics with biochemical and biophysical methods to gain a better understanding of protein modularity in dihydrofolate reductase (DHFR), an enzyme target of antibiotics also widely used as a model for protein evolution. We replace an integral α-helical segment of Escherichia coli DHFR with segments from a number of different organisms (many nonmicrobial) and examine how these chimeric enzymes affect organismal phenotypes (e.g., resistance to an antibiotic) as well as biophysical properties of the enzyme (e.g., thermostability). We find that organismal phenotypes and enzyme properties are highly sensitive to the identity of DHFR modules, and that this chimeric approach can create enzymes with diverse biophysical characteristics.
Collapse
Affiliation(s)
- João V Rodrigues
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - C Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| |
Collapse
|
21
|
Banerjee A, Levy Y, Mitra P. Analyzing Change in Protein Stability Associated with Single Point Deletions in a Newly Defined Protein Structure Database. J Proteome Res 2019; 18:1402-1410. [PMID: 30735617 DOI: 10.1021/acs.jproteome.9b00048] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein backbone alternation due to insertion/deletion or mutation operation often results in a change of fundamental biophysical properties of proteins. The proposed work intends to encode the protein stability changes associated with single point deletions (SPDs) of amino acids in proteins. The encoding will help in the primary screening of detrimental backbone modifications before opting for expensive in vitro experimentations. In the absence of any benchmark database documenting SPDs, we curate a data set containing SPDs that lead to both folded conformations and unfolded state. We differentiate these SPD instances with the help of simple structural and physicochemical features and eventually classify the foldability resulting out of SPDs using a Random Forest classifier and an Elliptic Envelope based outlier detector. Adhering to leave one out cross validation, the accuracy of the Random Forest classifier and the Elliptic Envelope is of 99.4% and 98.1%, respectively. The newly defined database and the delineation of SPD instances based on its resulting foldability provide a head start toward finding a solution to the given problem.
Collapse
Affiliation(s)
| | - Yaakov Levy
- Department of Structural Biology , Weizmann Institute of Science , Rehovot 76100 , Israel
| | | |
Collapse
|
22
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
23
|
Kovacs NA, Penev PI, Venapally A, Petrov AS, Williams LD. Circular Permutation Obscures Universality of a Ribosomal Protein. J Mol Evol 2018; 86:581-592. [PMID: 30306205 DOI: 10.1007/s00239-018-9869-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 09/28/2018] [Indexed: 12/29/2022]
Abstract
Functions, origins, and evolution of the translation system are best understood in the context of unambiguous and phylogenetically based taxonomy and nomenclature. Here, we map ribosomal proteins onto the tree of life and provide a nomenclature for ribosomal proteins that is consistent with phylogenetic relationships. We have increased the accuracy of homology relationships among ribosomal proteins, providing a more informative picture of their lineages. We demonstrate that bL33 (bacteria) and eL42 (archaea/eukarya) are homologs with common ancestry and acute similarities in sequence and structure. Their similarities were previously obscured by circular permutation. The most likely mechanism of permutation between bL33 and eL42 is duplication followed by fusion and deletion of both the first and last β-hairpins. bL33 and eL42 are composed of zinc ribbon protein folds, one of the most common zinc finger fold-groups of, and most frequently observed in translation-related domains. Bacterial-specific ribosomal protein bL33 and archaeal/eukaryotic-specific ribosomal protein eL42 are now both assigned the name of uL33, indicating a universal ribosomal protein. We provide a phylogenetic naming scheme for all ribosomal proteins that is based on phylogenetic relationships to be used as a tool for studying the systemics, evolution, and origins of the ribosome.
Collapse
Affiliation(s)
- Nicholas A Kovacs
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - Petar I Penev
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - Amitej Venapally
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - Anton S Petrov
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA.
| | - Loren Dean Williams
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA.
| |
Collapse
|
24
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|
25
|
Van Holle S, De Schutter K, Eggermont L, Tsaneva M, Dang L, Van Damme EJM. Comparative Study of Lectin Domains in Model Species: New Insights into Evolutionary Dynamics. Int J Mol Sci 2017; 18:ijms18061136. [PMID: 28587095 PMCID: PMC5485960 DOI: 10.3390/ijms18061136] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Revised: 05/20/2017] [Accepted: 05/22/2017] [Indexed: 01/07/2023] Open
Abstract
Lectins are present throughout the plant kingdom and are reported to be involved in diverse biological processes. In this study, we provide a comparative analysis of the lectin families from model species in a phylogenetic framework. The analysis focuses on the different plant lectin domains identified in five representative core angiosperm genomes (Arabidopsisthaliana, Glycine max, Cucumis sativus, Oryza sativa ssp. japonica and Oryza sativa ssp. indica). The genomes were screened for genes encoding lectin domains using a combination of Basic Local Alignment Search Tool (BLAST), hidden Markov models, and InterProScan analysis. Additionally, phylogenetic relationships were investigated by constructing maximum likelihood phylogenetic trees. The results demonstrate that the majority of the lectin families are present in each of the species under study. Domain organization analysis showed that most identified proteins are multi-domain proteins, owing to the modular rearrangement of protein domains during evolution. Most of these multi-domain proteins are widespread, while others display a lineage-specific distribution. Furthermore, the phylogenetic analyses reveal that some lectin families evolved to be similar to the phylogeny of the plant species, while others share a closer evolutionary history based on the corresponding protein domain architecture. Our results yield insights into the evolutionary relationships and functional divergence of plant lectins.
Collapse
Affiliation(s)
- Sofie Van Holle
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Kristof De Schutter
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Lore Eggermont
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Mariya Tsaneva
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Liuyi Dang
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Els J M Van Damme
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| |
Collapse
|
26
|
Van Holle S, Rougé P, Van Damme EJM. Evolution and structural diversification of Nictaba-like lectin genes in food crops with a focus on soybean (Glycine max). ANNALS OF BOTANY 2017; 119:901-914. [PMID: 28087663 PMCID: PMC5379587 DOI: 10.1093/aob/mcw259] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/24/2016] [Accepted: 11/17/2016] [Indexed: 05/10/2023]
Abstract
Background and Aims The Nictaba family groups all proteins that show homology to Nictaba, the tobacco lectin. So far, Nictaba and an Arabidopsis thaliana homologue have been shown to be implicated in the plant stress response. The availability of more than 50 sequenced plant genomes provided the opportunity for a genome-wide identification of Nictaba -like genes in 15 species, representing members of the Fabaceae, Poaceae, Solanaceae, Musaceae, Arecaceae, Malvaceae and Rubiaceae. Additionally, phylogenetic relationships between the different species were explored. Furthermore, this study included domain organization analysis, searching for orthologous genes in the legume family and transcript profiling of the Nictaba -like lectin genes in soybean. Methods Using a combination of BLASTp, InterPro analysis and hidden Markov models, the genomes of Medicago truncatula , Cicer arietinum , Lotus japonicus , Glycine max , Cajanus cajan , Phaseolus vulgaris , Theobroma cacao , Solanum lycopersicum , Solanum tuberosum , Coffea canephora , Oryza sativa , Zea mays, Sorghum bicolor , Musa acuminata and Elaeis guineensis were searched for Nictaba -like genes. Phylogenetic analysis was performed using RAxML and additional protein domains in the Nictaba-like sequences were identified using InterPro. Expression analysis of the soybean Nictaba -like genes was investigated using microarray data. Key Results Nictaba -like genes were identified in all studied species and analysis of the duplication events demonstrated that both tandem and segmental duplication contributed to the expansion of the Nictaba gene family in angiosperms. The single-domain Nictaba protein and the multi-domain F-box Nictaba architectures are ubiquitous among all analysed species and microarray analysis revealed differential expression patterns for all soybean Nictaba-like genes. Conclusions Taken together, the comparative genomics data contributes to our understanding of the Nictaba -like gene family in species for which the occurrence of Nictaba domains had not yet been investigated. Given the ubiquitous nature of these genes, they have probably acquired new functions over time and are expected to take on various roles in plant development and defence.
Collapse
Affiliation(s)
- Sofie Van Holle
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Pierre Rougé
- UMR 152 PHARMA-DEV, Université de Toulouse, IRD, UPS, Chemin des Maraîchers 35, 31400 Toulouse, France
| | - Els J. M. Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| |
Collapse
|
27
|
Imran M, Tang K, Liu JY. Comparative Genome-Wide Analysis of the Malate Dehydrogenase Gene Families in Cotton. PLoS One 2016; 11:e0166341. [PMID: 27829020 PMCID: PMC5102359 DOI: 10.1371/journal.pone.0166341] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Accepted: 10/27/2016] [Indexed: 11/19/2022] Open
Abstract
Malate dehydrogenases (MDHs) play crucial roles in the physiological processes of plant growth and development. In this study, 13 and 25 MDH genes were identified from Gossypium raimondii and Gossypium hirsutum, respectively. Using these and 13 previously reported Gossypium arboretum MDH genes, a comparative molecular analysis between identified MDH genes from G. raimondii, G. hirsutum, and G. arboretum was performed. Based on multiple sequence alignments, cotton MDHs were divided into five subgroups: mitochondrial MDH, peroxisomal MDH, plastidial MDH, chloroplastic MDH and cytoplasmic MDH. Almost all of the MDHs within the same subgroup shared similar gene structure, amino acid sequence, and conserved motifs in their functional domains. An analysis of chromosomal localization suggested that segmental duplication played a major role in the expansion of cotton MDH gene families. Additionally, a selective pressure analysis indicated that purifying selection acted as a vital force in the evolution of MDH gene families in cotton. Meanwhile, an expression analysis showed the distinct expression profiles of GhMDHs in different vegetative tissues and at different fiber developmental stages, suggesting the functional diversification of these genes in cotton growth and fiber development. Finally, a promoter analysis indicated redundant but typical cis-regulatory elements for the potential functions and stress activity of many MDH genes. This study provides fundamental information for a better understanding of cotton MDH gene families and aids in functional analyses of the MDH genes in cotton fiber development.
Collapse
Affiliation(s)
- Muhammad Imran
- Laboratory of Plant Molecular Biology, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Kai Tang
- Laboratory of Plant Molecular Biology, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Jin-Yuan Liu
- Laboratory of Plant Molecular Biology, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- * E-mail:
| |
Collapse
|
28
|
Zhong Y, Cheng ZMM. A unique RPW8-encoding class of genes that originated in early land plants and evolved through domain fission, fusion, and duplication. Sci Rep 2016; 6:32923. [PMID: 27678195 PMCID: PMC5039405 DOI: 10.1038/srep32923] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 08/16/2016] [Indexed: 01/17/2023] Open
Abstract
Duplication, lateral gene transfer, domain fusion/fission and de novo domain creation play a key role in formation of initial common ancestral protein. Abundant protein diversities are produced by domain rearrangements, including fusions, fissions, duplications, and terminal domain losses. In this report, we explored the origin of the RPW8 domain and examined the domain rearrangements that have driven the evolution of RPW8-encoding genes in land plants. The RPW8 domain first emerged in the early land plant, Physcomitrella patens, and it likely originated de novo from a non-coding sequence or domain divergence after duplication. It was then incorporated into the NBS-LRR protein to create a main sub-class of RPW8-encoding genes, the RPW8-NBS-encoding genes. They evolved by a series of genetic events of domain fissions, fusions, and duplications. Many species-specific duplication events and tandemly duplicated clusters clearly demonstrated that species-specific and tandem duplications played important roles in expansion of RPW8-encoding genes, especially in gymnosperms and species of the Rosaceae. RPW8 domains with greater Ka/Ks values than those of the NBS domains indicated that they evolved faster than the NBS domains in RPW8-NBSs.
Collapse
Affiliation(s)
- Yan Zhong
- College of Horticulture, Nanjing Agricultural University, Nanjing, 210095, China
| | - Zong-Ming Max Cheng
- College of Horticulture, Nanjing Agricultural University, Nanjing, 210095, China.,Department of Plant Science, University of Tennessee, Knoxville, 37996, USA
| |
Collapse
|
29
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
30
|
Abstract
Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%-30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding 'dark genomic matter'. These 'de novo' proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which 'de novo' proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a 'grow slow and moult' model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.
Collapse
|
31
|
Abstract
Translational readthrough (TR) has come into renewed focus because systems biology approaches have identified the first human genes undergoing functional translational readthrough (FTR). FTR creates functional extensions to proteins by continuing translation of the mRNA downstream of the stop codon. Here we review recent developments in TR research with a focus on the identification of FTR in humans and the systems biology methods that have spurred these discoveries.
Collapse
Affiliation(s)
- Fabian Schueren
- University Medical Center, Department of Child and Adolescent Health, University of Göttingen, Göttingen, Germany
| | - Sven Thoms
- University Medical Center, Department of Child and Adolescent Health, University of Göttingen, Göttingen, Germany
- * E-mail:
| |
Collapse
|
32
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|
33
|
Cao J, Chen Y, Jin M, Ren Q. Enhanced antimicrobial peptide-induced activity in the mollusc Toll-2 family through evolution via tandem Toll/interleukin-1 receptor. ROYAL SOCIETY OPEN SCIENCE 2016; 3:160123. [PMID: 27429771 PMCID: PMC4929906 DOI: 10.1098/rsos.160123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 05/12/2016] [Indexed: 06/06/2023]
Abstract
Toll receptors play an important role in the innate immunity of invertebrates. All reported Tolls have only one Toll/interleukin-1 receptor (TIR) domain at the C-terminal. In this study, numerous Tolls with tandem TIRs at the C-terminal were found in molluscs. Such Tolls presented an extra TIR (TIR-1) compared with Toll-I. Thus, Toll-I might be the ancestor of tandem TIRs containing Toll. To test this hypothesis, 83 Toll-I and Toll-2 (most have two TIRs, but others seem to be the evolutionary intermediates) genes from 29 shellfish species were identified. These Tolls were divided into nine groups based on phylogenetic analyses. A strong correlation between phylogeny and motif composition was found. All Toll proteins contained the TIR-2 domain, whereas the TIR-1 domain only existed in some Toll-2 protein, suggesting that TIR-1 domain insertion may play an important role in Toll protein evolution. Further analyses of functional divergence and adaptive evolution showed that some of the critical sites responsible for functional divergence may have been under positive selection. An additional intragenic recombination played an important role in the evolution of the Toll-I and Toll-2 genes. To investigate the functional difference of Toll-I and Toll-2, over expression of Hcu_Toll-I or Hcu_Toll-2-2 in Drosophila S2 cells was performed. Results showed that Hcu_Toll-2-2 had stronger antimicrobial peptide (AMP) activity than Hcu_Toll-I. Therefore, enhanced AMP-induced activity resulted from tandem TIRs in Toll-2s of molluscs during evolution history.
Collapse
Affiliation(s)
- Jun Cao
- Institute of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu, People's Republic of China
| | - Yihong Chen
- MOE Key Laboratory of Aquatic Product Safety/State Key Laboratory of Biocontrol, School of Marine Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Min Jin
- State Key Laboratory Breeding Base of Marine Genetic Resource, Third Institute of Oceanography, SOA, Xiamen 361005, People's Republic of China
| | - Qian Ren
- Jiangsu Key Laboratory for Biodiversity and Biotechnology and Jiangsu Key Laboratory for Aquatic Crustacean Diseases, College of Life Sciences, Nanjing Normal University, Nanjing 210046, People's Republic of China
| |
Collapse
|
34
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
35
|
Hsu CH, Chiang AWT, Hwang MJ, Liao BY. Proteins with Highly Evolvable Domain Architectures Are Nonessential but Highly Retained. Mol Biol Evol 2016; 33:1219-30. [DOI: 10.1093/molbev/msw006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
36
|
Stolzer M, Siewert K, Lai H, Xu M, Durand D. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 2015; 16 Suppl 14:S8. [PMID: 26451642 PMCID: PMC4610023 DOI: 10.1186/1471-2105-16-s14-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstructing evolution provides valuable insights into the processes of gene evolution and function. However, while there have been great advances in algorithms and software to reconstruct the history of gene families, these tools do not model the domain shuffling events (domain duplication, insertion, transfer, and deletion) that drive the evolution of multidomain protein families. Protein evolution through domain shuffling events allows for rapid exploration of functions by introducing new combinations of existing folds. This powerful mechanism was key to some significant evolutionary innovations, such as multicellularity and the vertebrate immune system. A method for reconstructing this important evolutionary process is urgently needed. RESULTS Here, we introduce a novel, event-based framework for studying multidomain evolution by reconciling a domain tree with a gene tree, with additional information provided by the species tree. In the context of this framework, we present the first reconciliation algorithms to infer domain shuffling events, while addressing the challenges inherent in the inference of evolution across three levels of organization. CONCLUSIONS We apply these methods to the evolution of domains in the Membrane associated Guanylate Kinase family. These case studies reveal a more vivid and detailed evolutionary history than previously provided. Our algorithms have been implemented in software, freely available at http://www.cs.cmu.edu/˜durand/Notung.
Collapse
|
37
|
Assessing the Metabolic Diversity of Streptococcus from a Protein Domain Point of View. PLoS One 2015; 10:e0137908. [PMID: 26366735 PMCID: PMC4569324 DOI: 10.1371/journal.pone.0137908] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 08/22/2015] [Indexed: 01/17/2023] Open
Abstract
Understanding the diversity and robustness of the metabolism of bacteria is fundamental for understanding how bacteria evolve and adapt to different environments. In this study, we characterised 121 Streptococcus strains and studied metabolic diversity from a protein domain perspective. Metabolic pathways were described in terms of the promiscuity of domains participating in metabolic pathways that were inferred to be functional. Promiscuity was defined by adapting existing measures based on domain abundance and versatility. The approach proved to be successful in capturing bacterial metabolic flexibility and species diversity, indicating that it can be described in terms of reuse and sharing functional domains in different proteins involved in metabolic activity. Additionally, we showed striking differences among metabolic organisation of the pathogenic serotype 2 Streptococcus suis and other strains.
Collapse
|
38
|
Krishnamurthy P, Kim JA, Jeong MJ, Kang CH, Lee SI. Defining the RNA-binding glycine-rich (RBG) gene superfamily: new insights into nomenclature, phylogeny, and evolutionary trends obtained by genome-wide comparative analysis of Arabidopsis, Chinese cabbage, rice and maize genomes. Mol Genet Genomics 2015; 290:2279-95. [PMID: 26123085 DOI: 10.1007/s00438-015-1080-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 06/10/2015] [Indexed: 10/23/2022]
Abstract
RNA-binding glycine-rich (RBG) proteins play diverse roles in plant growth, development, protection and genome organization. An overly broad definition for class IV glycine-rich proteins (GRPs), namely RNA-binding activity and a glycine-rich C-terminus, has resulted in many distantly related and/or non-related proteins being grouped into this class of RBGs. This definition has hampered the study of RBG evolution. In this study, we used a comparative genomic approach consisting of ortholog, homolog, synteny and phylogenetic analyses to legitimately exclude all distantly/non-related proteins from class IV GRPs and to identify 15, 22, 12 and 18 RBG proteins in Arabidopsis, Chinese cabbage, rice and maize genomes, respectively. All identified RBGs could be divided into three subclasses, namely RBGA, RBGB and RBGD, which may be derived from a common ancestor. We assigned RBGs excluded from class IV GRPs to a separate RBG superfamily. RBGs have evolved and diversified in different species via different mechanisms; segmental duplication and recombination have had major effects, with tandem duplication, intron addition/deletion and domain recombination/deletion playing minor roles. Loss and retention of duplicated RBGs after polyploidization has been species and subclass specific. For example, following recent whole-genome duplication and triplication in maize and Chinese cabbage, respectively, most duplicated copies of RBGA have been lost in maize while RBGD duplicates have been retained; in Chinese cabbage, in contrast, RBGA duplicates have been retained while RBGD duplicates have been lost. Our findings reveal fundamental information and shed new light on the structural characteristics and evolutionary dynamics of RBGs.
Collapse
Affiliation(s)
- Panneerselvam Krishnamurthy
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea
| | - Jin A Kim
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea
| | - Mi-Jeong Jeong
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea
| | - Chang Ho Kang
- Division of Applied Life Science and PMBBRC, Gyeongsang National University, Jinju, 660-701, Korea
| | - Soo In Lee
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea.
| |
Collapse
|
39
|
Guo P, Yoshimura A, Ishikawa N, Yamaguchi T, Guo Y, Tsukaya H. Comparative analysis of the RTFL peptide family on the control of plant organogenesis. JOURNAL OF PLANT RESEARCH 2015; 128:497-510. [PMID: 25701405 PMCID: PMC4408365 DOI: 10.1007/s10265-015-0703-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Accepted: 12/25/2014] [Indexed: 05/22/2023]
Abstract
Plant peptides play important roles in various aspects of plant growth and development. The RTFL/DVL family includes small peptides that are widely conserved among land plants. Overexpression of six RTFL genes in Arabidopsis was suggestive of their functions as negative regulators of cell proliferation and as positional cues along the longitudinal axis of the plant body . At this time, few reports are available on RTFL paralogs in other species and the evolutionary relationship of RTFL members among land plants remains unclear. In this study, we compared and analyzed whole amino acid sequences of 188 RTFL members from 22 species among land plants and identified 73 motifs. All RTFL members could be grouped into four clades, and each clade exhibited specific motif patterns, indicative of unique evolutionary traits in the RTFL family. In agreement with this hypothesis, we analyzed two RTFL members from Oryza sativa and Arabidopsis by overexpressing them in Arabidopsis, revealing similar phenotypes suggestive of a conserved function of the RTFL family between eudicots and monocots, as well as different phenotypes and unique functions.
Collapse
Affiliation(s)
- Pin Guo
- College of Life Science, Wuhan University, Wuhan, 430072 Hubei China
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Tokyo, 113-0033 Japan
| | - Asami Yoshimura
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Tokyo, 113-0033 Japan
| | - Naoko Ishikawa
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Tokyo, 113-0033 Japan
- Present Address: Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Tokyo, 153-8902 Japan
| | - Takahiro Yamaguchi
- Acel, Inc. SIC1 1201, 5-4-21 Nishihashimoto, Midori-ku, Sagamihara, Kanagawa Japan
| | - Youhao Guo
- College of Life Science, Wuhan University, Wuhan, 430072 Hubei China
| | - Hirokazu Tsukaya
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Tokyo, 113-0033 Japan
| |
Collapse
|
40
|
Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015; 16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. RESULTS Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. CONCLUSION Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
Collapse
Affiliation(s)
- Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| |
Collapse
|
41
|
Cao J, Li X, Lv Y, Ding L. Comparative analysis of the phytocyanin gene family in 10 plant species: a focus on Zea mays. FRONTIERS IN PLANT SCIENCE 2015; 6:515. [PMID: 26217366 PMCID: PMC4499708 DOI: 10.3389/fpls.2015.00515] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 06/26/2015] [Indexed: 05/18/2023]
Abstract
Phytocyanins (PCs) are plant-specific blue copper proteins, which play essential roles in electron transport. While the origin and expansion of this gene family is not well-investigated in plants. Here, we investigated their evolution by undertaking a genome-wide identification and comparison in 10 plants: Arabidopsis, rice, poplar, tomato, soybean, grape, maize, Selaginella moellendorffii, Physcomitrella patens, and Chlamydomonas reinhardtii. We found an expansion process of this gene family in evolution. Except PCs in Arabidopsis and rice, which have described in previous researches, a structural analysis of PCs in other eight plants indicated that 292 PCs contained N-terminal secretion signals and 217 PCs were expected to have glycosylphosphatidylinositol-anchor signals. Moreover, 281 PCs had putative arabinogalactan glycomodules and might be AGPs. Chromosomal distribution and duplication patterns indicated that tandem and segmental duplication played dominant roles for the expansion of PC genes. In addition, gene organization and motif compositions are highly conserved in each clade. Furthermore, expression profiles of maize PC genes revealed diversity in various stages of development. Moreover, all nine detected maize PC genes (ZmUC10, ZmUC16, ZmUC19, ZmSC2, ZmUC21, ZmENODL10, ZmUC22, ZmENODL13, and ZmENODL15) were down-regulated under salt treatment, and five PCs (ZmUC19, ZmSC2, ZmENODL10, ZmUC22, and ZmENODL13) were down-regulated under drought treatment. ZmUC16 was strongly expressed after drought treatment. This study will provide a basis for future understanding the characterization of this family.
Collapse
Affiliation(s)
- Jun Cao
- *Correspondence: Jun Cao, Institute of Life Sciences, Jiangsu University, Xuefu Road 301, Jiangsu, Zhenjiang 212013, China,
| | | | | | | |
Collapse
|
42
|
Krishnamurthy P, Hong JK, Kim JA, Jeong MJ, Lee YH, Lee SI. Genome-wide analysis of the expansin gene superfamily reveals Brassica rapa-specific evolutionary dynamics upon whole genome triplication. Mol Genet Genomics 2014; 290:521-30. [PMID: 25325993 DOI: 10.1007/s00438-014-0935-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 09/30/2014] [Indexed: 01/27/2023]
Abstract
Chinese cabbage (Brassica rapa subsp. pekinensis) is an economically important vegetable that has encountered four rounds of polyploidization. The fourth event, whole genome triplication (WGT), occurred after its divergence from Arabidopsis. Expansins (EXPs) are cell wall loosening proteins that participate in cell wall modification processes. In this study, the impacts of WGT on the B. rapa expansin (BrEXP) superfamily were evaluated. Whole genome screening of B. rapa identified 32 loci coding 53 expansin genes. Fifteen of the loci maintained a single gene copy, 15 maintained two gene copies and 2 maintained three gene copies. Six loci had no synteny to any Arabidopsis thaliana orthologs. Two loci were involved in tandem duplication. Segmental duplication and fragment recombination were dominant in accelerating BrEXP evolution. Three genes (BrEXPA7, BrEXLA1 and BrEXLA2) lost one of their ancestral introns, two genes (BrEXPA18 and BrEXPB6) gained new introns, and a domain tandem repeat (BrEXPA18) and domain recombination (Bra016981; not considered as expansin) were observed in one gene each. Further, domain deletion was observed in an additional five genes (Bra033068, Bra000142, Bra025800, Bra016473 and Bra004891, not considered as expansins) that lost one of their expansin-specific domains evolutionarily. These findings provide a basis for the evolution and modification of the BrEXP superfamily after a WGT event, which will help in determining the functional characteristics of BrEXPs.
Collapse
Affiliation(s)
- Panneerselvam Krishnamurthy
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea
| | | | | | | | | | | |
Collapse
|
43
|
Noll A, Grundmann N, Churakov G, Brosius J, Makałowski W, Schmitz J. GPAC-genome presence/absence compiler: a web application to comparatively visualize multiple genome-level changes. Mol Biol Evol 2014; 32:275-86. [PMID: 25261406 DOI: 10.1093/molbev/msu276] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Our understanding of genome-wide and comparative sequence information has been broadened considerably by the databases available from the University of California Santa Cruz (UCSC) Genome Bioinformatics Department. In particular, the identification and visualization of genomic sequences, present in some species but absent in others, led to fundamental insights into gene and genome evolution. However, the UCSC tools currently enable one to visualize orthologous genomic loci for a range of species in only a single locus. For large-scale comparative analyses of such presence/absence patterns a multilocus view would be more desirable. Such a tool would enable us to compare thousands of relevant loci simultaneously and to resolve many different questions about, for example, phylogeny, specific aspects of genome and gene evolution, such as the gain or loss of exons and introns, the emergence of novel transposed elements, nonprotein-coding RNAs, and viral genomic particles. Here, we present the first tool to facilitate the parallel analysis of thousands of genomic loci for cross-species presence/absence patterns based on multiway genome alignments. This genome presence/absence compiler uses annotated or other compilations of coordinates of genomic locations and compiles all presence/absence patterns in a flexible, color-coded table linked to the individual UCSC Genome Browser alignments. We provide examples of the versatile information content of such a screening system especially for 7SL-derived transposed elements, nuclear mitochondrial DNA, DNA transposons, and miRNAs in primates (http://www.bioinformatics.uni-muenster.de/tools/gpac, last accessed October 1, 2014).
Collapse
Affiliation(s)
- Angela Noll
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | - Norbert Grundmann
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Gennady Churakov
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | - Jürgen Brosius
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | - Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Jürgen Schmitz
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| |
Collapse
|
44
|
Terrapon N, Weiner J, Grath S, Moore AD, Bornberg-Bauer E. Rapid similarity search of proteins using alignments of domain arrangements. ACTA ACUST UNITED AC 2013; 30:274-81. [PMID: 23828785 DOI: 10.1093/bioinformatics/btt379] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
MOTIVATION Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capture the position-specific variation in an alignment of homologous sequences and can identify conserved motifs or domains. While profile-based search methods are generally more accurate than simple sequence comparison methods, they tend to be computationally more demanding. In recent years, several methods have emerged that perform protein similarity searches based on domain composition. However, few methods have considered the linear arrangements of domains when conducting similarity searches, despite strong evidence that domain order can harbour considerable functional and evolutionary signal. RESULTS Here, we introduce an alignment scheme that uses a classical dynamic programming approach to the global alignment of domains. We illustrate that representing proteins as strings of domains (domain arrangements) and comparing these strings globally allows for a both fast and sensitive homology search. Further, we demonstrate that the presented methods complement existing methods by finding similar proteins missed by popular amino-acid-based comparison methods. AVAILABILITY An implementation of the presented algorithms, a web-based interface as well as a command-line program for batch searching against the UniProt database can be found at http://rads.uni-muenster.de. Furthermore, we provide a JAVA API for programmatic access to domain-string–based search methods.
Collapse
Affiliation(s)
- Nicolas Terrapon
- Westfalian Wilhelms University, Institute of Evolution and Biodiversity, Huefferstr. 1, 48149 Muenster, Germany and Max Planck Institute for Infection Biology, Charitéplatz 1, 10117 Berlin, Germany
| | | | | | | | | |
Collapse
|
45
|
Hsu CH, Chen CK, Hwang MJ. The architectural design of networks of protein domain architectures. Biol Lett 2013; 9:20130268. [PMID: 23760167 DOI: 10.1098/rsbl.2013.0268] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature's blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.
Collapse
Affiliation(s)
- Chia-Hsin Hsu
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, Republic of China
| | | | | |
Collapse
|
46
|
Bornberg-Bauer E, Albà MM. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol 2013; 23:459-66. [PMID: 23562500 DOI: 10.1016/j.sbi.2013.02.012] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/15/2013] [Accepted: 02/15/2013] [Indexed: 11/29/2022]
Abstract
During protein evolution, novel domain arrangements are continuously formed. Rearrangements are important for the creation of molecular biodiversity and for functional molecular changes which underlie developmental shifts in the bauplan of organisms. Here we review the mechanisms by which new arrangements arise and the potential benefits of rearrangements. We concentrate on how new domains emerge and why they rapidly spread across genomes, gaining higher copy numbers than older, more established domains. This spread is most likely a consequence of their high adaptive potential but is unlikely to make up on its own for the drastic loss of domains, which is observed across different taxa. We show that a significant portion of the recently emerged domains, especially those in multidomain families, are highly disordered and speculate about the significance of these findings for the evolvability of novel genetic material.
Collapse
Affiliation(s)
- Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Hüfferstrasse 1, D48149 Münster, Germany.
| | | |
Collapse
|
47
|
Toll-Riera M, Albà MM. Emergence of novel domains in proteins. BMC Evol Biol 2013; 13:47. [PMID: 23425224 PMCID: PMC3599535 DOI: 10.1186/1471-2148-13-47] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 01/31/2013] [Indexed: 12/31/2022] Open
Abstract
Background Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. Results To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. Conclusions We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB) - Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | |
Collapse
|
48
|
Moore AD, Grath S, Schüler A, Huylmans AK, Bornberg-Bauer E. Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:898-907. [PMID: 23376183 DOI: 10.1016/j.bbapap.2013.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/06/2013] [Accepted: 01/09/2013] [Indexed: 12/24/2022]
Abstract
Modularity is a hallmark of molecular evolution. Whether considering gene regulation, the components of metabolic pathways or signaling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Similarly, protein domains are the modules of proteins, and modular domain rearrangements can create diversity with seemingly few operations in turn allowing for swift changes to an organism's functional repertoire. Here, we assess the patterns and functional effects of modular rearrangements at high resolution. Using a well resolved and diverse group of pancrustaceans, we illustrate arrangement diversity within closely related organisms, estimate arrangement turnover frequency and establish, for the first time, branch-specific rate estimates for fusion, fission, domain addition and terminal loss. Our results show that roughly 16 new arrangements arise per million years and that between 64% and 81% of these can be explained by simple, single-step modular rearrangement events. We find evidence that the frequencies of fission and terminal deletion events increase over time, and that modular rearrangements impact all levels of the cellular signaling apparatus and thus may have strong adaptive potential. Novel arrangements that cannot be explained by simple modular rearrangements contain a significant amount of repeat domains that occur in complex patterns which we term "supra-repeats". Furthermore, these arrangements are significantly longer than those with a single-step rearrangement solution, suggesting that such arrangements may result from multi-step events. In summary, our analysis provides an integrated view and initial quantification of the patterns and functional impact of modular protein evolution in a well resolved phylogenetic tree. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Collapse
Affiliation(s)
- Andrew D Moore
- Institute for Evolution and Biodiversity, Münster, Germany
| | | | | | | | | |
Collapse
|
49
|
Abstract
Prions are agents of analog, protein conformation-based inheritance that can confer beneficial phenotypes to cells, especially under stress. Combined with genetic variation, prion-mediated inheritance can be channeled into prion-independent genomic inheritance. Latest screening shows that prions are common, at least in fungi. Thus, there is non-negligible flow of information from proteins to the genome in modern cells, in a direct violation of the Central Dogma of molecular biology. The prion-mediated heredity that violates the Central Dogma appears to be a specific, most radical manifestation of the widespread assimilation of protein (epigenetic) variation into genetic variation. The epigenetic variation precedes and facilitates genetic adaptation through a general 'look-ahead effect' of phenotypic mutations. This direction of the information flow is likely to be one of the important routes of environment-genome interaction and could substantially contribute to the evolution of complex adaptive traits.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
50
|
Leclère L, Rentzsch F. Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins. Genome Biol Evol 2012; 4:883-99. [PMID: 22813778 PMCID: PMC3516229 DOI: 10.1093/gbe/evs061] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2012] [Indexed: 12/13/2022] Open
Abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
Collapse
Affiliation(s)
- Lucas Leclère
- Sars International Centre for Marine Molecular Biology, University of Bergen, Norway.
| | | |
Collapse
|