1
|
Mikhailova AA, Dohmen E, Harrison MC. Major changes in domain arrangements are associated with the evolution of termites. J Evol Biol 2024; 37:758-769. [PMID: 38630634 DOI: 10.1093/jeb/voae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/18/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024]
Abstract
Domains as functional protein units and their rearrangements along the phylogeny can shed light on the functional changes of proteomes associated with the evolution of complex traits like eusociality. This complex trait is associated with sterile soldiers and workers, and long-lived, highly fecund reproductives. Unlike in Hymenoptera (ants, bees, and wasps), the evolution of eusociality within Blattodea, where termites evolved from within cockroaches, was accompanied by a reduction in proteome size, raising the question of whether functional novelty was achieved with existing rather than novel proteins. To address this, we investigated the role of domain rearrangements during the evolution of termite eusociality. Analysing domain rearrangements in the proteomes of three solitary cockroaches and five eusocial termites, we inferred more than 5,000 rearrangements over the phylogeny of Blattodea. The 90 novel domain arrangements that emerged at the origin of termites were enriched for several functions related to longevity, such as protein homeostasis, DNA repair, mitochondrial activity, and nutrient sensing. Many domain rearrangements were related to changes in developmental pathways, important for the emergence of novel castes. Along with the elaboration of social complexity, including permanently sterile workers and larger, foraging colonies, we found 110 further domain arrangements with functions related to protein glycosylation and ion transport. We found an enrichment of caste-biased expression and splicing within rearranged genes, highlighting their importance for the evolution of castes. Furthermore, we found increased levels of DNA methylation among rearranged compared to non-rearranged genes suggesting fundamental differences in their regulation. Our findings indicate the importance of domain rearrangements in the generation of functional novelty necessary for termite eusociality to evolve.
Collapse
Affiliation(s)
- Alina A Mikhailova
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
2
|
Ye Z, Mao D, Wang Y, Deng H, Liu X, Zhang T, Han Z, Zhang X. Comparative Genome-Wide Identification of the Fatty Acid Desaturase Gene Family in Tea and Oil Tea. PLANTS (BASEL, SWITZERLAND) 2024; 13:1444. [PMID: 38891253 PMCID: PMC11174766 DOI: 10.3390/plants13111444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 05/07/2024] [Accepted: 05/16/2024] [Indexed: 06/21/2024]
Abstract
Camellia oil is valuable as an edible oil and serves as a base material for a range of high-value products. Camellia plants of significant economic importance, such as Camellia sinensis and Camellia oleifera, have been classified into sect. Thea and sect. Oleifera, respectively. Fatty acid desaturases play a crucial role in catalyzing the formation of double bonds at specific positions of fatty acid chains, leading to the production of unsaturated fatty acids and contributing to lipid synthesis. Comparative genomics results have revealed that expanded gene families in oil tea are enriched in functions related to lipid, fatty acid, and seed processes. To explore the function of the FAD gene family, a total of 82 FAD genes were identified in tea and oil tea. Transcriptome data showed the differential expression of the FAD gene family in mature seeds of tea tree and oil tea tree. Furthermore, the structural analysis and clustering of FAD proteins provided insights for the further exploration of the function of the FAD gene family and its role in lipid synthesis. Overall, these findings shed light on the role of the FAD gene family in Camellia plants and their involvement in lipid metabolism, as well as provide a reference for understanding their function in oil synthesis.
Collapse
Affiliation(s)
- Ziqi Ye
- The Laboratory of Forestry Genetics, Central South University of Forestry and Technology, Changsha 410004, China; (Z.Y.); (H.D.); (X.L.); (T.Z.)
| | - Dan Mao
- National Forest and Seedling Workstation of Hunan Province, The Forestry Department of Hunan Province, Changsha 410004, China; (D.M.); (Y.W.)
| | - Yujian Wang
- National Forest and Seedling Workstation of Hunan Province, The Forestry Department of Hunan Province, Changsha 410004, China; (D.M.); (Y.W.)
| | - Hongda Deng
- The Laboratory of Forestry Genetics, Central South University of Forestry and Technology, Changsha 410004, China; (Z.Y.); (H.D.); (X.L.); (T.Z.)
| | - Xing Liu
- The Laboratory of Forestry Genetics, Central South University of Forestry and Technology, Changsha 410004, China; (Z.Y.); (H.D.); (X.L.); (T.Z.)
| | - Tongyue Zhang
- The Laboratory of Forestry Genetics, Central South University of Forestry and Technology, Changsha 410004, China; (Z.Y.); (H.D.); (X.L.); (T.Z.)
| | - Zhiqiang Han
- The Laboratory of Forestry Genetics, Central South University of Forestry and Technology, Changsha 410004, China; (Z.Y.); (H.D.); (X.L.); (T.Z.)
| | - Xingtan Zhang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| |
Collapse
|
3
|
García-Paz FDM, Del Moral S, Morales-Arrieta S, Ayala M, Treviño-Quintanilla LG, Olvera-Carranza C. Multidomain chimeric enzymes as a promising alternative for biocatalysts improvement: a minireview. Mol Biol Rep 2024; 51:410. [PMID: 38466518 DOI: 10.1007/s11033-024-09332-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/07/2024] [Indexed: 03/13/2024]
Abstract
Searching for new and better biocatalysts is an area of study in constant development. In nature, mechanisms generally occurring in evolution, such as genetic duplication, recombination, and natural selection processes, produce various enzymes with different architectures and properties. The recombination of genes that code proteins produces multidomain chimeric enzymes that contain two or more domains that sometimes enhance their catalytic properties. Protein engineering has mimicked this process to enhance catalytic activity and the global stability of enzymes, searching for new and better biocatalysts. Here, we present and discuss examples from both natural and synthetic multidomain chimeric enzymes and how additional domains heighten their stability and catalytic activity. Moreover, we also describe progress in developing new biocatalysts using synthetic fusion enzymes and revise some methodological strategies to improve their biological fitness.
Collapse
Affiliation(s)
- Flor de María García-Paz
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México
| | - Sandra Del Moral
- Investigador por México-CONAHCyT, Unidad de Investigación y Desarrollo en Alimentos, Tecnológico Nacional de México, Campus Veracruz. MA de Quevedo 2779, Col. Formando Hogar, CP 91960, Veracruz, Veracruz, México
| | - Sandra Morales-Arrieta
- Departamento de Biotecnología, Universidad Politécnica del Estado de Morelos, Boulevard Cuauhnáhuac No. 566 Col. Lomas del Texcal CP 62550, Jiutepec, Morelos, México
| | - Marcela Ayala
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México
| | - Luis Gerardo Treviño-Quintanilla
- Departamento de Biotecnología, Universidad Politécnica del Estado de Morelos, Boulevard Cuauhnáhuac No. 566 Col. Lomas del Texcal CP 62550, Jiutepec, Morelos, México
| | - Clarita Olvera-Carranza
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México.
| |
Collapse
|
4
|
García-Morales A, Balleza D. Exploring Flexibility and Folding Patterns Throughout Time in Voltage Sensors. J Mol Evol 2023; 91:819-836. [PMID: 37955698 DOI: 10.1007/s00239-023-10140-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 10/27/2023] [Indexed: 11/14/2023]
Abstract
The voltage-sensing domain (VSD) is a module capable of responding to changes in the membrane potential through conformational changes and facilitating electromechanical coupling to open a pore gate, activate proton permeation pathways, or promote enzymatic activity in some membrane-anchored phosphatases. To carry out these functions, this module acts cooperatively through conformational changes. The VSD is formed by four transmembrane segments (S1-S4) but the S4 segment is critical since it carries positively charged residues, mainly Arg or Lys, which require an aqueous environment for its proper function. The discovery of this module in voltage-gated ion channels (VGICs), proton channels (Hv1), and voltage sensor-containing phosphatases (VSPs) has expanded our understanding of the principle of modularity in the voltage-sensing mechanism of these proteins. Here, by sequence comparison and the evaluation of the relationship between sequence composition, intrinsic flexibility, and structural analysis in 14 selected representatives of these three major protein groups, we report five interesting differences in the folding patterns of the VSD both in prokaryotes and eukaryotes. Our main findings indicate that this module is highly conserved throughout the evolutionary scale, however: (1) segments S1 to S3 in eukaryotes are significantly more hydrophobic than those present in prokaryotes; (2) the S4 segment has retained its hydrophilic character; (3) in eukaryotes the extramembranous linkers are significantly larger and more flexible in comparison with those present in prokaryotes; (4) the sensors present in the kHv1 proton channel and the ciVSP phosphatase, both of eukaryotic origin, exhibit relationships of flexibility and folding patterns very close to the typical ones found in prokaryotic voltage sensors; and (5) archaeal channels KvAP and MVP have flexibility profiles which are clearly contrasting in the S3-S4 region, which could explain their divergent activation mechanisms. Finally, to elucidate the obscure origins of this module, we show further evidence for a possible connection between voltage sensors and TolQ proteins.
Collapse
Affiliation(s)
- Abigail García-Morales
- Tecnológico Nacional de México, Instituto Tecnológico de Veracruz, Unidad de Investigación y Desarrollo en Alimentos, Calz. Miguel Angel de Quevedo 2779, Col. Formando Hogar, CP. 91897, Veracruz, Ver, Mexico
| | - Daniel Balleza
- Tecnológico Nacional de México, Instituto Tecnológico de Veracruz, Unidad de Investigación y Desarrollo en Alimentos, Calz. Miguel Angel de Quevedo 2779, Col. Formando Hogar, CP. 91897, Veracruz, Ver, Mexico.
| |
Collapse
|
5
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Diversity and features of proteins with structural repeats. Biophys Rev 2023; 15:1159-1169. [PMID: 37974986 PMCID: PMC10643770 DOI: 10.1007/s12551-023-01130-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/28/2023] [Indexed: 11/19/2023] Open
Abstract
The review provides information on proteins with structural repeats, including their classification, characteristics, functions, and relevance in disease development. It explores methods for identifying structural repeats and specialized databases. The review also highlights the potential use of repeat proteins as drug design scaffolds and discusses their evolutionary mechanisms.
Collapse
Affiliation(s)
- Evgeniya I. Deryusheva
- Institute for Biological Instrumentation, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Russia
| | - Andrey V. Machulin
- Skryabin Institute of Biochemistry and Physiology of Microorganisms, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Russia
| | - Oxana V. Galzitskaya
- Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Russia
- Institute of Theoretical and Experimental Biophysics of the Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
6
|
Arrías PN, Monzon AM, Clementel D, Mozaffari S, Piovesan D, Kajava AV, Tosatto SCE. The repetitive structure of DNA clamps: An overlooked protein tandem repeat. J Struct Biol 2023; 215:108001. [PMID: 37467824 DOI: 10.1016/j.jsb.2023.108001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/12/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Structured tandem repeats proteins (STRPs) are a specific kind of tandem repeat proteins characterized by a modular and repetitive three-dimensional structure arrangement. The majority of STRPs adopt solenoid structures, but with the increasing availability of experimental structures and high-quality predicted structural models, more STRP folds can be characterized. Here, we describe "Box repeats", an overlooked STRP fold present in the DNA sliding clamp processivity factors, which has eluded classification although structural data has been available since the late 1990s. Each Box repeat is a β⍺βββ module of about 60 residues, which forms a class V "beads-on-a-string" type STRP. The number of repeats present in processivity factors is organism dependent. Monomers of PCNA proteins in both Archaea and Eukarya have 4 repeats, while the monomers of bacterial beta-sliding clamps have 6 repeats. This new repeat fold has been added to the RepeatsDB database, which now provides structural annotation for 66 Box repeat proteins belonging to different organisms, including viruses.
Collapse
Affiliation(s)
- Paula Nazarena Arrías
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Soroush Mozaffari
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
7
|
The Modular Architecture of Metallothioneins Facilitates Domain Rearrangements and Contributes to Their Evolvability in Metal-Accumulating Mollusks. Int J Mol Sci 2022; 23:ijms232415824. [PMID: 36555472 PMCID: PMC9781358 DOI: 10.3390/ijms232415824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/05/2022] [Accepted: 12/10/2022] [Indexed: 12/15/2022] Open
Abstract
Protein domains are independent structural and functional modules that can rearrange to create new proteins. While the evolution of multidomain proteins through the shuffling of different preexisting domains has been well documented, the evolution of domain repeat proteins and the origin of new domains are less understood. Metallothioneins (MTs) provide a good case study considering that they consist of metal-binding domain repeats, some of them with a likely de novo origin. In mollusks, for instance, most MTs are bidomain proteins that arose by lineage-specific rearrangements between six putative domains: α, β1, β2, β3, γ and δ. Some domains have been characterized in bivalves and gastropods, but nothing is known about the MTs and their domains of other Mollusca classes. To fill this gap, we investigated the metal-binding features of NpoMT1 of Nautilus pompilius (Cephalopoda class) and FcaMT1 of Falcidens caudatus (Caudofoveata class). Interestingly, whereas NpoMT1 consists of α and β1 domains and has a prototypical Cd2+ preference, FcaMT1 has a singular preference for Zn2+ ions and a distinct domain composition, including a new Caudofoveata-specific δ domain. Overall, our results suggest that the modular architecture of MTs has contributed to MT evolution during mollusk diversification, and exemplify how modularity increases MT evolvability.
Collapse
|
8
|
Ercolano MR, D’Esposito D, Andolfo G, Frusciante L. Multilevel evolution shapes the function of NB-LRR encoding genes in plant innate immunity. FRONTIERS IN PLANT SCIENCE 2022; 13:1007288. [PMID: 36388554 PMCID: PMC9647133 DOI: 10.3389/fpls.2022.1007288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
A sophisticated innate immune system based on diverse pathogen receptor genes (PRGs) evolved in the history of plant life. To reconstruct the direction and magnitude of evolutionary trajectories of a given gene family, it is critical to detect the ancestral signatures. The rearrangement of functional domains made up the diversification found in PRG repertoires. Structural rearrangement of ancient domains mediated the NB-LRR evolutionary path from an initial set of modular proteins. Events such as domain acquisition, sequence modification and temporary or stable associations are prominent among rapidly evolving innate immune receptors. Over time PRGs are continuously shaped by different forces to find their optimal arrangement along the genome. The immune system is controlled by a robust regulatory system that works at different scales. It is important to understand how the PRG interaction network can be adjusted to meet specific needs. The high plasticity of the innate immune system is based on a sophisticated functional architecture and multi-level control. Due to the complexity of interacting with diverse pathogens, multiple defense lines have been organized into interconnected groups. Genomic architecture, gene expression regulation and functional arrangement of PRGs allow the deployment of an appropriate innate immunity response.
Collapse
|
9
|
Borah P, Ni F, Ying W, Zhuang H, Chong SL, Hu XG, Yang J, Lin EP, Huang H. Genome-wide identification and characterization of OVATE family proteins in Betula luminifera reveals involvement of BlOFP3 and BlOFP5 genes in leaf development. FRONTIERS IN PLANT SCIENCE 2022; 13:950936. [PMID: 36311104 PMCID: PMC9613114 DOI: 10.3389/fpls.2022.950936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/22/2022] [Indexed: 06/16/2023]
Abstract
Ovate family proteins (OFP) are plant-specific transcription factors involved in regulating morphologies of the lateral organs, plant growth and development. However, the functional roles of OFP genes in Betula luminifera, an important timber tree species, are not well studied. In this study, we identified 20 BlOFP genes and analyzed their phylogenetic relationship, gene structure, conserved motifs, and cis-elements. Further, expression analysis indicates that BlOFP genes were up-regulated in leaves on the one-year-old branch compared to leaves on the current-year branch and bract, except BlOFP7, BlOFP11, BlOFP14 and BlOFP12. The overexpression of BlOFP3 and BlOFP5 in Arabidopsis thaliana not only resulted in a slower growth rate but also produced sawtooth shape, flatter and darker green rosette leaves. Further investigation showed that the leaf thickness of the transgenic plants was more than double that of the wild type, which was caused by the increasement in the number and size of palisade tissue cells. Furthermore, the expression analysis also indicated that the expressions of several genes related to leaf development were significantly changed in the transgene plants. These results suggested the significant roles of BlOFP3 and BlOFP5 in leaf development. Moreover, protein-protein interaction studies showed that BlOFP3 interacts with BlKNAT5, and BlOFP5 interacts with BlKNAT5, BlBLH6 and BlBLH7. In conclusion, our study demonstrates that BlOFP3 and BlOFP5 were involved in leaf shape and thickness regulation by forming a complex with BlKNAT5, BlBLH6 and BlBLH7. In addition, our study serves as a guide for future functional genomic studies of OFP genes of the B. luminifera.
Collapse
Affiliation(s)
- Priyanka Borah
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Fei Ni
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Weiyang Ying
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Hebi Zhuang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Sun-Li Chong
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Xian-Ge Hu
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, China
| | - Er-pei Lin
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| | - Huahong Huang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou, China
| |
Collapse
|
10
|
Huwa N, Weiergräber OH, Fejzagić AV, Kirsch C, Schaffrath U, Classen T. The Crystal Structure of the Defense Conferring Rice Protein OsJAC1 Reveals a Carbohydrate Binding Site on the Dirigent-like Domain. Biomolecules 2022; 12:biom12081126. [PMID: 36009020 PMCID: PMC9405769 DOI: 10.3390/biom12081126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/31/2022] [Accepted: 08/11/2022] [Indexed: 11/16/2022] Open
Abstract
Pesticides are routinely used to prevent severe losses in agriculture. This practice is under debate because of its potential negative environmental impact and selection of resistances in pathogens. Therefore, the development of disease resistant plants is mandatory. It was shown that the rice (Oryza sativa) protein OsJAC1 enhances resistance against different bacterial and fungal plant pathogens in rice, barley, and wheat. Recently we reported possible carbohydrate interaction partners for both domains of OsJAC1 (a jacalin-related lectin (JRL) and a dirigent (DIR) domain), however, a mechanistic understanding of its function is still lacking. Here, we report crystal structures for both individual domains and the complex of galactobiose with the DIR domain, which revealed a new carbohydrate binding motif for DIR proteins. Docking studies of the two domains led to a model of the full-length protein. Our findings offer insights into structure and binding properties of OsJAC1 and its possible function in pathogen resistance.
Collapse
Affiliation(s)
- Nikolai Huwa
- Institute for Bioorganic Chemistry, Heinrich Heine University Düsseldorf, 52425 Jülich, Germany
| | - Oliver H. Weiergräber
- Institute of Biological Information Processing 7: Structural Biochemistry and Jülich Centre for Structural Biology, Forschungszentrum Jülich, 52425 Jülich, Germany
| | - Alexander V. Fejzagić
- Institute for Bioorganic Chemistry, Heinrich Heine University Düsseldorf, 52425 Jülich, Germany
| | - Christian Kirsch
- Institute for Biology III, Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Ulrich Schaffrath
- Institute for Biology III, Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Thomas Classen
- Institute for Bio- and Geosciences 1: Bioorganic Chemistry, Forschungszentrum Jülich, 52425 Jülich, Germany
- Correspondence:
| |
Collapse
|
11
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
12
|
Cui X, Xue Y, McCormack C, Garces A, Rachman TW, Yi Y, Stolzer M, Durand D. Simulating domain architecture evolution. Bioinformatics 2022; 38:i134-i142. [PMID: 35758772 PMCID: PMC9236583 DOI: 10.1093/bioinformatics/btac242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Simulation is an essential technique for generating biomolecular data with a ‘known’ history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation. Results Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis–Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation. Availability and implementation DomArchov is written in Python 3 and is available at http://www.cs.cmu.edu/~durand/DomArchov. The data underlying this article are available via the same link. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyue Cui
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yifan Xue
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Collin McCormack
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alejandro Garces
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Thomas W Rachman
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yang Yi
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
13
|
Zmasek CM, Lefkowitz EJ, Niewiadomska A, Scheuermann RH. Genomic evolution of the Coronaviridae family. Virology 2022; 570:123-133. [PMID: 35398776 PMCID: PMC8965632 DOI: 10.1016/j.virol.2022.03.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 03/11/2022] [Accepted: 03/18/2022] [Indexed: 01/03/2023]
Abstract
The current outbreak of coronavirus disease-2019 (COVID-19) caused by SARS-CoV-2 poses unparalleled challenges to global public health. SARS-CoV-2 is a Betacoronavirus, one of four genera belonging to the Coronaviridae subfamily Orthocoronavirinae. Coronaviridae, in turn, are members of the order Nidovirales, a group of enveloped, positive-stranded RNA viruses. Here we present a systematic phylogenetic and evolutionary study based on protein domain architecture, encompassing the entire proteomes of all Orthocoronavirinae, as well as other Nidovirales. This analysis has revealed that the genomic evolution of Nidovirales is associated with extensive gains and losses of protein domains. In Orthocoronavirinae, the sections of the genomes that show the largest divergence in protein domains are found in the proteins encoded in the amino-terminal end of the polyprotein (PP1ab), the spike protein (S), and many of the accessory proteins. The diversity among the accessory proteins is particularly striking, as each subgenus possesses a set of accessory proteins that is almost entirely specific to that subgenus. The only notable exception to this is ORF3b, which is present and orthologous over all Alphacoronaviruses. In contrast, the membrane protein (M), envelope small membrane protein (E), nucleoprotein (N), as well as proteins encoded in the central and carboxy-terminal end of PP1ab (such as the 3C-like protease, RNA-dependent RNA polymerase, and Helicase) show stable domain architectures across all Orthocoronavirinae. This comprehensive analysis of the Coronaviridae domain architecture has important implication for efforts to develop broadly cross-protective coronavirus vaccines.
Collapse
Affiliation(s)
- Christian M Zmasek
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA, 92037, USA
| | - Elliot J Lefkowitz
- Department of Microbiology, UAB School of Medicine, Birmingham, AL, 35294, USA
| | - Anna Niewiadomska
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA, 92037, USA
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA, 92037, USA; Department of Pathology, University of California, San Diego, CA, 92093, USA; Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA; Global Virus Network, Baltimore MD, 21201, USA.
| |
Collapse
|
14
|
Rivera AM, Swanson WJ. The Importance of Gene Duplication and Domain Repeat Expansion for the Function and Evolution of Fertilization Proteins. Front Cell Dev Biol 2022; 10:827454. [PMID: 35155436 PMCID: PMC8830517 DOI: 10.3389/fcell.2022.827454] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
The process of gene duplication followed by gene loss or evolution of new functions has been studied extensively, yet the role gene duplication plays in the function and evolution of fertilization proteins is underappreciated. Gene duplication is observed in many fertilization protein families including Izumo, DCST, ZP, and the TFP superfamily. Molecules mediating fertilization are part of larger gene families expressed in a variety of tissues, but gene duplication followed by structural modifications has often facilitated their cooption into a fertilization function. Repeat expansions of functional domains within a gene also provide opportunities for the evolution of novel fertilization protein. ZP proteins with domain repeat expansions are linked to species-specificity in fertilization and TFP proteins that experienced domain duplications were coopted into a novel sperm function. This review outlines the importance of gene duplications and repeat domain expansions in the evolution of fertilization proteins.
Collapse
|
15
|
Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021; 118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open
Abstract
Exchanges of protein sequence modules support leaps in function unavailable through point mutations during evolution. Here we study the role of the two RAD51-interacting modules within the eight binding BRC repeats of BRCA2. We created 64 chimeric repeats by shuffling these modules and measured their binding to RAD51. We found that certain shuffled module combinations were stronger binders than any of the module combinations in the natural repeats. Surprisingly, the contribution from the two modules was poorly correlated with affinities of natural repeats, with a weak BRC8 repeat containing the most effective N-terminal module. The binding of the strongest chimera, BRC8-2, to RAD51 was improved by -2.4 kCal/mol compared to the strongest natural repeat, BRC4. A crystal structure of RAD51:BRC8-2 complex shows an improved interface fit and an extended β-hairpin in this repeat. BRC8-2 was shown to function in human cells, preventing the formation of nuclear RAD51 foci after ionizing radiation.
Collapse
Affiliation(s)
- Laurens H Lindenburg
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Teodors Pantelejevs
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Fabrice Gielen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Living Systems Institute, University of Exeter, Exeter EX4 4QD, United Kingdom
| | - Pedro Zuazua-Villar
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maren Butz
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Eric Rees
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Clemens F Kaminski
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Jessica A Downs
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marko Hyvönen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| |
Collapse
|
16
|
Gilchrist CLM, Chooi YH. Synthaser: a CD-Search enabled Python toolkit for analysing domain architecture of fungal secondary metabolite megasynth(et)ases. Fungal Biol Biotechnol 2021; 8:13. [PMID: 34763725 PMCID: PMC8582187 DOI: 10.1186/s40694-021-00120-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fungi are prolific producers of secondary metabolites (SMs), which are bioactive small molecules with important applications in medicine, agriculture and other industries. The backbones of a large proportion of fungal SMs are generated through the action of large, multi-domain megasynth(et)ases such as polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The structure of these backbones is determined by the domain architecture of the corresponding megasynth(et)ase, and thus accurate annotation and classification of these architectures is an important step in linking SMs to their biosynthetic origins in the genome. RESULTS Here we report synthaser, a Python package leveraging the NCBI's conserved domain search tool for remote prediction and classification of fungal megasynth(et)ase domain architectures. Synthaser is capable of batch sequence analysis, and produces rich textual output and interactive visualisations which allow for quick assessment of the megasynth(et)ase diversity of a fungal genome. Synthaser uses a hierarchical rule-based classification system, which can be extensively customised by the user through a web application ( http://gamcil.github.io/synthaser ). We show that synthaser provides more accurate domain architecture predictions than comparable tools which rely on curated profile hidden Markov model (pHMM)-based approaches; the utilisation of the NCBI conserved domain database also allows for significantly greater flexibility compared to pHMM approaches. In addition, we demonstrate how synthaser can be applied to large scale genome mining pipelines through the construction of an Aspergillus PKS similarity network. CONCLUSIONS Synthaser is an easy to use tool that represents a significant upgrade to previous domain architecture analysis tools. It is freely available under a MIT license from PyPI ( https://pypi.org/project/synthaser ) and GitHub ( https://github.com/gamcil/synthaser ).
Collapse
Affiliation(s)
- Cameron L M Gilchrist
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| | - Yit-Heng Chooi
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| |
Collapse
|
17
|
Phillips JC. Synchronized attachment and the Darwinian evolution of coronaviruses CoV-1 and CoV-2. PHYSICA A 2021; 581:126202. [PMID: 34177077 PMCID: PMC8216869 DOI: 10.1016/j.physa.2021.126202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 05/20/2021] [Indexed: 05/05/2023]
Abstract
CoV2019 has evolved to be much more dangerous than CoV2003. Experiments suggest that structural rearrangements dramatically enhance CoV2019 activity. We identify a new first stage of infection that precedes structural rearrangements by using biomolecular evolutionary theory to identify sequence differences enhancing viral attachment rates. We find a small cluster of mutations which show that CoV-2 has a new feature that promotes much stronger viral attachment and enhances contagiousness. The extremely dangerous dynamics of human coronavirus infection is a dramatic example of evolutionary approach of self-organized networks to criticality. It may favor a very successful vaccine. The identified mutations can be used to test the present theory experimentally.
Collapse
Affiliation(s)
- J C Phillips
- Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, United States of America
| |
Collapse
|
18
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
19
|
Guo H, Wang L, Deng Y, Ye J. Novel perspectives of environmental proteomics. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 788:147588. [PMID: 34023612 DOI: 10.1016/j.scitotenv.2021.147588] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Revised: 04/08/2021] [Accepted: 05/01/2021] [Indexed: 06/12/2023]
Abstract
The connection among genome expression, proteome alteration, metabolism regulation and phenotype change under environmental stresses is very vague. It is a tough task for the traditional research approaches to reveal the related scientific mechanisms of the above connection at molecular and systematic levels. Proteomics approach is an insightful tool for revealing the biological functions, metabolic networks and functional protein interaction networks of cells and organisms under stresses at the systematic level. The purpose of this review is to provide an insightful guideline on how to set up a proteomic investigation for revealing biomolecule mechanisms, protein biomarkers and metabolism networks related to stress response, pollutant recognition, transport and biodegradation, and providing an insightful high-throughput approach for screening functional enzymes and effective microbes based on bioinformatics and functional verification method. Furthermore, the toxicity evaluation of pollutants and byproducts by proteomics approaches provides a scientific insight for early diagnosis of ecological risk and determination of the effectiveness of pollutant treatment techniques.
Collapse
Affiliation(s)
- Huiying Guo
- Key Laboratory of Environmental Exposure and Health of Guangdong Province, School of Environment, Jinan University, Guangzhou 510632, China; Institute of Orthopedic Diseases, Department of Bone and Joint Surgery, The First Affiliated Hospital, Jinan University, Guangzhou 510630, China
| | - Lili Wang
- Key Laboratory of Environmental Exposure and Health of Guangdong Province, School of Environment, Jinan University, Guangzhou 510632, China
| | - Ying Deng
- Key Laboratory of Environmental Exposure and Health of Guangdong Province, School of Environment, Jinan University, Guangzhou 510632, China
| | - Jinshao Ye
- Key Laboratory of Environmental Exposure and Health of Guangdong Province, School of Environment, Jinan University, Guangzhou 510632, China.
| |
Collapse
|
20
|
Etzion-Fuchs A, Todd DA, Singh M. dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains. Nucleic Acids Res 2021; 49:e78. [PMID: 33999210 PMCID: PMC8287948 DOI: 10.1093/nar/gkab356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/30/2021] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
Collapse
Affiliation(s)
- Anat Etzion-Fuchs
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA
| | - David A Todd
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.,Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| |
Collapse
|
21
|
Sexually dimorphic expression and regulatory sequence of dnali1 in the olive flounder Paralichthys olivaceus. Mol Biol Rep 2021; 48:3529-3540. [PMID: 33877529 DOI: 10.1007/s11033-021-06342-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 04/07/2021] [Indexed: 10/21/2022]
Abstract
Dynein axonemal light intermediate chain 1 (dnali1) is an important part of axonemal dyneins and plays an important role in the growth and development of animals. However, there is little information about dnali1 in fish. Herein, we cloned dnali1 gene from the genome of olive flounder (Paralichthys olivaceus), a commercially important maricultured fish in China, Japan, and Korea, and analyzed its expression patterns in different gender fish. The flounder dnali1 DNA sequence contained a 771 bp open reading frame (ORF), two different sizes of 5' untranslated region (5'UTR), and a 1499 bp 3' untranslated region (3'UTR). Two duplicated 922 nt fragments were found in dnali1 mRNA. The first fragment contained the downstream coding region and the front portion of 3'UTR, and the second fragment was entirely located in 3'UTR. Multiple alignments indicated that the flounder Dnali1 protein contained the putative conserved coiled-coil domain. Its expression showed sexually dimorphic with predominant expression in the flounder testis, and lower expression in other tissues. The gene with the longer 5'UTR was specifically expressed in the testis. The highest expression level in the testis was detected at stages IV and V. Transient expression analysis showed that the 922 bp repeated sequence 3'UTR of dnali1 down-regulated the expression of GFP at the early stage in zebrafish. The flounder dnali1 might play an important role in the testis, especially in the period of spermatogenesis, and the 5'UTR and the repetitive sequences in 3'UTR might contain some regulatory elements for the cilia.
Collapse
|
22
|
Han X, Guo J, Pang E, Song H, Lin K. Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study. Genome Biol Evol 2021; 12:185-202. [PMID: 32108239 PMCID: PMC7144356 DOI: 10.1093/gbe/evaa041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2020] [Indexed: 01/05/2023] Open
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Collapse
Affiliation(s)
- Xia Han
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| |
Collapse
|
23
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
24
|
Yazhini A, Srinivasan N, Sandhya S. Signatures of conserved and unique molecular features in Afrotheria. Sci Rep 2021; 11:1011. [PMID: 33441654 PMCID: PMC7806701 DOI: 10.1038/s41598-020-79559-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 12/07/2020] [Indexed: 11/09/2022] Open
Abstract
Afrotheria is a clade of African-origin species with striking dissimilarities in appearance and habitat. In this study, we compared whole proteome sequences of six Afrotherian species to obtain a broad viewpoint of their underlying molecular make-up, to recognize potentially unique proteomic signatures. We find that 62% of the proteomes studied here, predominantly involved in metabolism, are orthologous, while the number of homologous proteins between individual species is as high as 99.5%. Further, we find that among Afrotheria, L. africana has several orphan proteins with 112 proteins showing < 30% sequence identity with their homologues. Rigorous sequence searches and complementary approaches were employed to annotate 156 uncharacterized protein sequences and 28 species-specific proteins. For 122 proteins we predicted potential functional roles, 43 of which we associated with protein- and nucleic-acid binding roles. Further, we analysed domain content and variations in their combinations within Afrotheria and identified 141 unique functional domain architectures, highlighting proteins with potential for specialized functions. Finally, we discuss the potential relevance of highly represented protein families such as MAGE-B2, olfactory receptor and ribosomal proteins in L. africana and E. edwardii, respectively. Taken together, our study reports the first comparative study of the Afrotherian proteomes and highlights salient molecular features.
Collapse
Affiliation(s)
- Arangasamy Yazhini
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Narayanaswamy Srinivasan
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| | - Sankaran Sandhya
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| |
Collapse
|
25
|
Abstract
Proteins are major functional molecules that physically and functionally interact to carry out cellular processes. The physical interactions are generally mediated by domain-level interactions. Thus, novel protein-protein interactions can be predicted using various computational methods based on domain-domain interactions, using resolved structures of protein complexes. Functional protein interactions can be inferred based on shared domains between proteins, since proteins involved in the same biological processes tend to harbor common domains. We recently developed a method of inferring functional interactions between proteins using associations between their domain compositions, which can be represented as domain profiles. Since the method requires only protein domain annotations, it can be easily applied to any species with a sequenced genome. Here, we describe in detail the method of generating domain profiles for proteins and measuring the association between them to infer functional interactions between proteins. We also demonstrate that domain profile association can be used to successfully construct a large-scale functional network of human proteins.
Collapse
Affiliation(s)
- Jung Eun Shim
- Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, South Korea.
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, South Korea.
| |
Collapse
|
26
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
27
|
Cross I, García E, Rodríguez ME, Arias-Pérez A, Portela-Bens S, Merlo MA, Rebordinos L. The genomic structure of the highly-conserved dmrt1 gene in Solea senegalensis (Kaup, 1868) shows an unexpected intragenic duplication. PLoS One 2020; 15:e0241518. [PMID: 33137109 PMCID: PMC7605655 DOI: 10.1371/journal.pone.0241518] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 10/15/2020] [Indexed: 01/17/2023] Open
Abstract
Knowing the factors responsible for sex determination in a species has significant theoretical and practical implications; the dmrt1 gene (Doublesex and Mab-3 (DM)-related Transcription factor 1) plays this role in diverse animal species. Solea senegalensis is a commercially important flat fish in which females grow 30% faster than males. It has 2n = 42 chromosomes and an XX / XY chromosome system for sex determination, without heteromorph chromosomes but with sex proto-chromosome. In the present study, we are providing the genomic structure and nucleotide sequence of dmrt1 gene obtained from cDNA from male and female adult gonads. A cDNA of 2027 containing an open-reading frame (ORF) of 1206 bp and encoding a 402 aa protein it is described for dmrt1 gene of S. senegalensis. Multiple mRNA isoforms indicating a high variable system of alternative splicing in the expression of dmrt1 of the sole in gonads were studied. None isoforms could be related to sex of individuals. The genomic structure of the dmrt1 of S. senegalensis showed a gene of 31400 bp composed of 7 exons and 6 introns. It contains an unexpected duplication of more than 10399 bp, involving part of the exon I, exons II and III and a SINE element found in the sequence that it is proposed as responsible for the duplication. A mature miRNA of 21 bp in length was localized at 336 bp from exon V. Protein-protein interacting networks of the dmrt1 gene showed matches with dmrt1 protein from Cynoglossus semilaevis and a protein interaction network with 11 nodes (dmrt1 plus 10 other proteins). The phylogenetic relationship of the dmrt1 gene in S. senegalensis is consistent with the evolutionary position of its species. The molecular characterization of this gene will enhance its functional analysis and the understanding of sex differentiation in Solea senegalensis and other flatfish.
Collapse
Affiliation(s)
- Ismael Cross
- Area de Genética, CASEM, Universidad de Cádiz, Puerto Real, Cádiz, Spain
| | - Emilio García
- Area de Genética, CASEM, Universidad de Cádiz, Puerto Real, Cádiz, Spain
| | - María E. Rodríguez
- Area de Genética, CASEM, Universidad de Cádiz, Puerto Real, Cádiz, Spain
| | | | | | - Manuel A. Merlo
- Area de Genética, CASEM, Universidad de Cádiz, Puerto Real, Cádiz, Spain
| | | |
Collapse
|
28
|
Phylogeny and Structure of Fatty Acid Photodecarboxylases and Glucose-Methanol-Choline Oxidoreductases. Catalysts 2020. [DOI: 10.3390/catal10091072] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Glucose-methanol-choline (GMC) oxidoreductases are a large and diverse family of flavin-binding enzymes found in all kingdoms of life. Recently, a new related family of proteins has been discovered in algae named fatty acid photodecarboxylases (FAPs). These enzymes use the energy of light to convert fatty acids to the corresponding Cn-1 alkanes or alkenes, and hold great potential for biotechnological application. In this work, we aimed at uncovering the natural diversity of FAPs and their relations with other GMC oxidoreductases. We reviewed the available GMC structures, assembled a large dataset of GMC sequences, and found that one active site amino acid, a histidine, is extremely well conserved among the GMC proteins but not among FAPs, where it is replaced with alanine. Using this criterion, we found several new potential FAP genes, both in genomic and metagenomic databases, and showed that related bacterial, archaeal and fungal genes are unlikely to be FAPs. We also identified several uncharacterized clusters of GMC-like proteins as well as subfamilies of proteins that lack the conserved histidine but are not FAPs. Finally, the analysis of the collected dataset of potential photodecarboxylase sequences revealed the key active site residues that are strictly conserved, whereas other residues in the vicinity of the flavin adenine dinucleotide (FAD) cofactor and in the fatty acid-binding pocket are more variable. The identified variants may have different FAP activity and selectivity and consequently may prove useful for new biotechnological applications, thereby fostering the transition from a fossil carbon-based economy to a bio-economy by enabling the sustainable production of hydrocarbon fuels.
Collapse
|
29
|
The structures of two archaeal type IV pili illuminate evolutionary relationships. Nat Commun 2020; 11:3424. [PMID: 32647180 PMCID: PMC7347861 DOI: 10.1038/s41467-020-17268-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022] Open
Abstract
We have determined the cryo-electron microscopic (cryo-EM) structures of two archaeal type IV pili (T4P), from Pyrobaculum arsenaticum and Saccharolobus solfataricus, at 3.8 Å and 3.4 Å resolution, respectively. This triples the number of high resolution archaeal T4P structures, and allows us to pinpoint the evolutionary divergence of bacterial T4P, archaeal T4P and archaeal flagellar filaments. We suggest that extensive glycosylation previously observed in T4P of Sulfolobus islandicus is a response to an acidic environment, as at even higher temperatures in a neutral environment much less glycosylation is present for Pyrobaculum than for Sulfolobus and Saccharolobus pili. Consequently, the Pyrobaculum filaments do not display the remarkable stability of the Sulfolobus filaments in vitro. We identify the Saccharolobus and Pyrobaculum T4P as host receptors recognized by rudivirus SSRV1 and tristromavirus PFV2, respectively. Our results illuminate the evolutionary relationships among bacterial and archaeal T4P filaments and provide insights into archaeal virus-host interactions. Archaeal type IV pili (T4P) mediate adhesion to surfaces and are receptors for hyperthermophilic archaeal viruses. Here, the authors present the cryo-EM structures of two archaeal T4P from Pyrobaculum arsenaticum and Saccharolobus solfataricus and discuss evolutionary relationships between bacterial T4P, archaeal T4P and archaeal flagellar filaments.
Collapse
|
30
|
Alvarez-Carreño C, Coello G, Arciniega M. FiRES: A computational method for the de novo identification of internal structure similarity in proteins. Proteins 2020; 88:1169-1179. [PMID: 32112578 DOI: 10.1002/prot.25886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 11/12/2019] [Accepted: 02/24/2020] [Indexed: 11/08/2022]
Abstract
Internal structure similarity in proteins can be observed at the domain and subdomain levels. From an evolutionary perspective, structurally similar elements may arise divergently by gene duplication and fusion events but may also be the product of convergent evolution under physicochemical constraints. The characterization of proteins that contain repeated structural elements has implications for many fields of protein science including protein domain evolution, structure classification, structure prediction, and protein engineering. FiRES (Find Repeated Elements in Structure) is an algorithm that relies on a topology-independent structure alignment method to identify repeating elements in protein structure. FiRES was tested against two hand curated databases of protein repeats: MALIDUP, for very divergent duplicated domains; and RepeatsDB for short tandem repeats. The performance of FiRES was compared to that of lalign, RADAR, HHrepID, CE-symm, ReUPred, and Swelfe. FiRES was the method that most accurately detected proteins either with duplicated domains (accuracy = 0.86) or with multiple repeated units (accuracy = 0.92). FiRES is a new methodology for the discovery of proteins containing structurally similar elements. The FiRES web server is publicly available at http://fires.ifc.unam.mx. The scripts, results, and benchmarks from this study can be downloaded from https://github.com/Claualvarez/fires.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Gerardo Coello
- Unidad de Cómputo, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marcelino Arciniega
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
31
|
Filatov G, Bauwens B, Kertész-Farkas A. LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification. Bioinformatics 2019; 34:3281-3288. [PMID: 29741583 DOI: 10.1093/bioinformatics/bty349] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 05/03/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Results Here, we present a new convolutional kernel function for protein sequences called the Lempel-Ziv-Welch (LZW)-Kernel. It is based on code words identified with the LZW universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance, which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with Basic Local Alignment Search Tool (BLAST) scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. Availability and implementation LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. Supplementary information Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Gleb Filatov
- Faculty of Computer Science, Department of Data Analysis and Artificial Intelligence, Moscow, Russia
| | - Bruno Bauwens
- Faculty of Computer Science, Department of Big Data and Information Retrieval, Moscow, Russia
| | - Attila Kertész-Farkas
- Faculty of Computer Science, Department of Data Analysis and Artificial Intelligence, Moscow, Russia
| |
Collapse
|
32
|
Verma R, Pandit SB. Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PLoS One 2019; 14:e0220336. [PMID: 31374091 PMCID: PMC6677297 DOI: 10.1371/journal.pone.0220336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/12/2019] [Indexed: 12/22/2022] Open
Abstract
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that ~88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as ~84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, ~83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that ~86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins.
Collapse
Affiliation(s)
- Rivi Verma
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
- * E-mail:
| |
Collapse
|
33
|
Basile W, Salvatore M, Bassot C, Elofsson A. Why do eukaryotic proteins contain more intrinsically disordered regions? PLoS Comput Biol 2019; 15:e1007186. [PMID: 31329574 PMCID: PMC6675126 DOI: 10.1371/journal.pcbi.1007186] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 08/01/2019] [Accepted: 06/14/2019] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder? Intrinsic disorder is essential for various functions in eukaryotic cells and is a signature of eukaryotic proteins. Here, we try to understand the origin of the difference in disorder between eukaryotic and prokaryotic proteins. We show that eukaryotic proteins contain more extended linker regions and that these linker regions are significantly more disordered. Further, we show, for the first time, that the difference in disorder originates from a systematic difference in amino acid frequencies between eukaryotic and prokaryotic proteins. Three amino acids contribute to the difference in disorder; serine and proline are more abundant in eukaryotic linkers, while isoleucine is less frequent. These shifts in frequencies are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. Anyhow the widespread of the shifts in abundance indicates that the differences are ancient and caused be some yet not fully understood selective difference acting on eukaryotic and prokaryotic proteins.
Collapse
Affiliation(s)
- Walter Basile
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Marco Salvatore
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Claudio Bassot
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Swedish e-Science Research Center (SeRC), Stockholm, Sweden
- * E-mail:
| |
Collapse
|
34
|
Shim JE, Kim JH, Shin J, Lee JE, Lee I. Pathway-specific protein domains are predictive for human diseases. PLoS Comput Biol 2019; 15:e1007052. [PMID: 31075101 PMCID: PMC6530867 DOI: 10.1371/journal.pcbi.1007052] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 05/22/2019] [Accepted: 04/19/2019] [Indexed: 01/04/2023] Open
Abstract
Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes. Protein domains are basic functional units of proteins, yet domain-based pathway annotations for proteins are challenging tasks because many domains are pervasive among diverse pathways. Therefore, we developed a network-based scoring scheme to measure pathway specificity of domains, and then used it to identify pathway-specific domains. Surprisingly, we observed substantially more disease mutations in pathway-specific domains than non-specific domains. We found evidences that mutations of pathway-specific domains tend to perturb pathway integrity via disrupting within-pathway protein-protein interactions. We also demonstrated prediction capacity of pathway-specific domains for complex diseases with experimental validations. Our study demonstrated the usefulness of pathway information for protein domains in interpreting non-random distribution of disease mutations among domains and identification of disease genes and variants.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Ji Hyun Kim
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Ji Eun Lee
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
- Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
- * E-mail:
| |
Collapse
|
35
|
Sanchez de Groot N, Torrent Burgas M, Ravarani CN, Trusina A, Ventura S, Babu MM. The fitness cost and benefit of phase-separated protein deposits. Mol Syst Biol 2019; 15:e8075. [PMID: 30962358 PMCID: PMC6452874 DOI: 10.15252/msb.20178075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Phase separation of soluble proteins into insoluble deposits is associated with numerous diseases. However, protein deposits can also function as membrane-less compartments for many cellular processes. What are the fitness costs and benefits of forming such deposits in different conditions? Using a model protein that phase-separates into deposits, we distinguish and quantify the fitness contribution due to the loss or gain of protein function and deposit formation in yeast. The environmental condition and the cellular demand for the protein function emerge as key determinants of fitness. Protein deposit formation can influence cell-to-cell variation in free protein abundance between individuals of a cell population (i.e., gene expression noise). This results in variable manifestation of protein function and a continuous range of phenotypes in a cell population, favoring survival of some individuals in certain environments. Thus, protein deposit formation by phase separation might be a mechanism to sense protein concentration in cells and to generate phenotypic variability. The selectable phenotypic variability, previously described for prions, could be a general property of proteins that can form phase-separated assemblies and may influence cell fitness.
Collapse
Affiliation(s)
- Natalia Sanchez de Groot
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK .,Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Marc Torrent Burgas
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.,Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Ala Trusina
- Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
| | - Salvador Ventura
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - M Madan Babu
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| |
Collapse
|
36
|
Pathmanathan JS, Lopez P, Lapointe FJ, Bapteste E. CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection. Mol Biol Evol 2019; 35:252-255. [PMID: 29092069 PMCID: PMC5850286 DOI: 10.1093/molbev/msx283] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.
Collapse
Affiliation(s)
| | - Philippe Lopez
- Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France
| | | | - Eric Bapteste
- Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France
| |
Collapse
|
37
|
Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH. Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Virology 2019; 529:29-42. [PMID: 30660046 PMCID: PMC6502252 DOI: 10.1016/j.virol.2019.01.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 01/04/2019] [Accepted: 01/04/2019] [Indexed: 12/13/2022]
Abstract
We developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the analysis of protein orthology by combining phylogenetic and protein domain-architecture information. Using DAIO, we performed a systematic study of the proteomes of all human Herpesviridae species to define Strict Ortholog Groups (SOGs). In addition to assessing the taxonomic distribution for each protein based on sequence similarity, we performed a protein domain-architecture analysis for every protein family and computationally inferred gene duplication events. While many herpesvirus proteins have evolved without any detectable gene duplications or domain rearrangements, numerous herpesvirus protein families do exhibit complex evolutionary histories. Some proteins acquired additional domains (e.g., DNA polymerase), whereas others show a combination of domain acquisition and gene duplication (e.g., betaherpesvirus US22 family), with possible functional implications. This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).
Collapse
Affiliation(s)
| | - David M Knipe
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Philip E Pellett
- Department of Biochemistry, Microbiology & Immunology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, CA 92037, USA; Department of Pathology, University of California, San Diego, CA 92093, USA; Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.
| |
Collapse
|
38
|
Carroll HD, Spouge JL, Gonzalez M. MultiDomainBenchmark: a multi-domain query and subject database suite. BMC Bioinformatics 2019; 20:77. [PMID: 30764761 PMCID: PMC6376684 DOI: 10.1186/s12859-019-2660-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 01/28/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .
Collapse
Affiliation(s)
- Hyrum D. Carroll
- TSYS School of Computer Science, Columbus State University, 4225 University Avenue, Columbus, 31907 GA USA
| | - John L. Spouge
- National Center for Biotechnology Information, Bethesda, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894 MD USA
| | - Mileidy Gonzalez
- National Center for Biotechnology Information, Bethesda, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894 MD USA
| |
Collapse
|
39
|
Kurafeiski JD, Pinto P, Bornberg-Bauer E. Evolutionary Potential of Cis-Regulatory Mutations to Cause Rapid Changes in Transcription Factor Binding. Genome Biol Evol 2019; 11:406-414. [PMID: 30597011 PMCID: PMC6370388 DOI: 10.1093/gbe/evy269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/11/2018] [Indexed: 01/25/2023] Open
Abstract
Transcriptional regulation is crucial for all biological processes and well investigated at the molecular level for a wide range of organisms. However, it is quite unclear how innovations, such as the activity of a novel regulatory element, evolve. In the case of transcription factor (TF) binding, both a novel TF and a novel-binding site would need to evolve concertedly. Since promiscuous functions have recently been identified as important intermediate steps in creating novel specific functions in many areas such as enzyme evolution and protein-protein interactions, we ask here how promiscuous binding of TFs to TF-binding sites (TFBSs) affects the robustness and evolvability of this tightly regulated system. Specifically, we investigate the binding behavior of several hundred TFs from different species at unprecedented breadth. Our results illustrate multiple aspects of TF-binding interactions, ranging from correlations between the strength of the interaction bond and specificity, to preferences regarding TFBS nucleotide composition in relation to both domains and binding specificity. We identified a subset of high A/T binding motifs. Motifs in this subset had many functionally neutral one-error mutants, and were bound by multiple different binding domains. Our results indicate that, especially for some TF-TFBS associations, low binding specificity confers high degrees of evolvability, that is that few mutations facilitate rapid changes in transcriptional regulation, in particular for large and old TF families. In this study we identify binding motifs exhibiting behavior indicating high evolutionary potential for innovations in transcriptional regulation.
Collapse
Affiliation(s)
| | - Paulo Pinto
- Molecular Evolution and Bioinformatics, University of Muenster, Germany
| | | |
Collapse
|
40
|
Abstract
Protein domains are reusable segments of proteins and play an important role in protein evolution. By combining the elements from a relatively small set of domains into unique arrangements, a large number of distinct proteins can be generated. Since domains often have specific functions, changes in their arrangement usually affect the overall protein function. Furthermore, domains are well amenable to computational representations, e.g., by Hidden Markov Models (HMMs), and these HMMs are widely represented in various databases. Therefore, domains can be efficiently used for proteomic analyses. Here, we describe how domains are annotated using different domain databases and then how to assess the annotation quality of proteomes. We next show how functional annotations of domains in large-scale data such as whole genomes or transcriptomes can be used to analyze molecular differences between species. Furthermore, we describe methods to analyze the changes in domain content of proteins which significantly helps to characterize and reconstruct the modular evolution of proteins. Altogether, domain-based methods offer a computationally highly effective approach to analyze large amounts of proteomic data in an evolutionary setting.
Collapse
Affiliation(s)
- Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
41
|
Li L, Bansal MS. An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:63-76. [PMID: 29994126 DOI: 10.1109/tcbb.2018.2846253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The majority of genes in eukaryotes consists of one or more protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Yet, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop an integrated model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species trees, by explicitly considering domain-level evolution and decoupling domain-level events from gene-level events. In this paper, we (i) introduce the new integrated reconciliation framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large biological dataset, and (v) demonstrate the impact of using our new computational framework compared to existing approaches. The implemented software is freely available from http://compbio.engr.uconn.edu/software/seadog/.
Collapse
|
42
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
43
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
44
|
Structural modules of the stress-induced protein HflX: an outlook on its evolution and biological role. Curr Genet 2018; 65:363-370. [PMID: 30448945 DOI: 10.1007/s00294-018-0905-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/05/2018] [Accepted: 11/13/2018] [Indexed: 12/23/2022]
Abstract
Multifunctional proteins often show modular structures. A functional domain and the structural modules within the domain show evolutionary conservation of their spatial arrangement since that gives the protein its functionality. However, the question remains as to how members of different domains of life (Archaea, Bacteria, Eukarya), polish and perfect these modules within conserved multidomain proteins, to tailor functional proteins according to their specific requirements. In the quest for plausible answers to this question, we studied the bacterial protein HflX. HflX is a universally conserved member of the Obg-GTPase superfamily but its functional role in Archaea and Eukarya is barely known. It is a multidomain protein and possesses, in addition to its conserved GTPase domain, an ATP-binding N-terminal domain. It is involved in heat stress response in Escherichia coli and our laboratory recently identified an ATP-dependent RNA helicase activity of E. coli HflX, which is likely instrumental in rescuing ribosomes during heat stress. Because perception and response to stress is expected to be different in different life forms, the question is whether this activity is preserved in higher organisms or not. Thus, we explored the evolution pattern of different structural modules of HflX, with particular emphasis on the ATP-binding domain, to understand plausible biological role of HflX in other forms of life. Our analyses indicate that, while the evolutionary pattern of the GTPase domain follows a conserved phylogeny, conservation of the ATP-binding domain shows a complicated pattern. The limited analysis described here hints towards possible evolutionary adaptations and modifications of the domain, something which needs to be investigated in more depth in homologs from other life forms. Deciphering how nature 'tweaks' such modules, both structurally and functionally, may help in understanding the evolution of such proteins, and, on a large-scale, of stress-related proteins in general as well.
Collapse
|
45
|
Dangwal M, Das S. Identification and Analysis of OVATE Family Members from Genome of the Early Land Plants Provide Insights into Evolutionary History of OFP Family and Function. J Mol Evol 2018; 86:511-530. [PMID: 30206666 DOI: 10.1007/s00239-018-9863-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 09/05/2018] [Indexed: 01/11/2023]
Abstract
Mosses, liverworts, hornworts and lycophytes represent transition stages between the aquatic to terrestrial/land plants. Several morphological and adaptive novelties driven by genomic components including emergence and expansion of new or existing gene families have played a critical role during and after the transition, and contributed towards successful colonization of terrestrial ecosystems. It is crucial to decipher the evolutionary transitions and natural selection on the gene structure and function to understand the emergence of phenotypic and adaptive diversity. Plants at the "transition zone", between aquatic and terrestrial ecosystem, are also the most vulnerable because of climate change and may contain clues for successful mitigation of the challenges of climate change. Identification and comparative analyses of such genetic elements and gene families are few in mosses, liverworts, hornworts and lycophytes. Ovate family proteins (OFPs) are plant-specific transcriptional repressors and are acknowledged for their roles in important growth and developmental processes in land plants, and information about the functional aspects of OFPs in early land plants is fragmentary. As a first step towards addressing this gap, a comprehensive in silico analysis was carried out utilizing publicly available genome sequences of Marchantia polymorpha (Mp), Physcomitrella patens (Pp), Selaginella moellendorffii (Sm) and Sphagnum fallax (Sf). Our analysis led to the identification of 4 MpOFPs, 19 PpOFPs, 6 SmOFPs and 3 SfOFPs. Cross-genera analysis revealed a drastic change in the structure and physiochemical properties in OFPs suggesting functional diversification and genomic plasticity during the evolutionary course. Knowledge gained from this comparative analysis will form the framework towards deciphering and dissection of their developmental and adaptive role/s in early land plants and could provide insights into evolutionary strategies adapted by land plants.
Collapse
Affiliation(s)
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
46
|
Jakubec D, Kratochvíl M, Vymĕtal J, Vondrášek J. Widespread evolutionary crosstalk among protein domains in the context of multi-domain proteins. PLoS One 2018; 13:e0203085. [PMID: 30169546 PMCID: PMC6118372 DOI: 10.1371/journal.pone.0203085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 08/14/2018] [Indexed: 11/20/2022] Open
Abstract
Domains are distinct units within proteins that typically can fold independently into recognizable three-dimensional structures to facilitate their functions. The structural and functional independence of protein domains is reflected by their apparent modularity in the context of multi-domain proteins. In this work, we examined the coupling of evolution of domain sequences co-occurring within multi-domain proteins to see if it proceeds independently, or in a coordinated manner. We used continuous information theory measures to assess the extent of correlated mutations among domains in multi-domain proteins from organisms across the tree of life. In all multi-domain architectures we examined, domains co-occurring within protein sequences had to some degree undergone concerted evolution. This finding challenges the notion of complete modularity and independence of protein domains, providing new perspective on the evolution of protein sequence and function.
Collapse
Affiliation(s)
- David Jakubec
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University, 128 43 Prague 2, Czech Republic
| | - Miroslav Kratochvíl
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, 118 00 Prague 1, Czech Republic
| | - Jiří Vymĕtal
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| | - Jiří Vondrášek
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| |
Collapse
|
47
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
48
|
Bailey PC, Schudoma C, Jackson W, Baggs E, Dagdas G, Haerty W, Moscou M, Krasileva KV. Dominant integration locus drives continuous diversification of plant immune receptors with exogenous domain fusions. Genome Biol 2018; 19:23. [PMID: 29458393 PMCID: PMC5819176 DOI: 10.1186/s13059-018-1392-6] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2017] [Accepted: 01/16/2018] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The plant immune system is innate and encoded in the germline. Using it efficiently, plants are capable of recognizing a diverse range of rapidly evolving pathogens. A recently described phenomenon shows that plant immune receptors are able to recognize pathogen effectors through the acquisition of exogenous protein domains from other plant genes. RESULTS We show that plant immune receptors with integrated domains are distributed unevenly across their phylogeny in grasses. Using phylogenetic analysis, we uncover a major integration clade, whose members underwent repeated independent integration events producing diverse fusions. This clade is ancestral in grasses with members often found on syntenic chromosomes. Analyses of these fusion events reveals that homologous receptors can be fused to diverse domains. Furthermore, we discover a 43 amino acid long motif associated with this dominant integration clade which is located immediately upstream of the fusion site. Sequence analysis reveals that DNA transposition and/or ectopic recombination are the most likely mechanisms of formation for nucleotide binding leucine rich repeat proteins with integrated domains. CONCLUSIONS The identification of this subclass of plant immune receptors that is naturally adapted to new domain integration will inform biotechnological approaches for generating synthetic receptors with novel pathogen "baits."
Collapse
Affiliation(s)
- Paul C Bailey
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | | | - William Jackson
- The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Erin Baggs
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Gulay Dagdas
- The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Matthew Moscou
- The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Ksenia V Krasileva
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK.
- The Sainsbury Laboratory, Norwich Research Park, Norwich, NR4 7UH, UK.
| |
Collapse
|
49
|
Cossins BP, Lawson ADG, Shi J. Computational Exploration of Conformational Transitions in Protein Drug Targets. Methods Mol Biol 2018; 1762:339-365. [PMID: 29594780 DOI: 10.1007/978-1-4939-7756-7_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
Protein drug targets vary from highly structured to completely disordered; either way dynamics governs function. Hence, understanding the dynamical aspects of how protein targets function can enable improved interventions with drug molecules. Computational approaches offer highly detailed structural models of protein dynamics which are becoming more predictive as model quality and sampling power improve. However, the most advanced and popular models still have errors owing to imperfect parameter sets and often cannot access longer timescales of many crucial biological processes. Experimental approaches offer more certainty but can struggle to detect and measure lightly populated conformations of target proteins and subtle allostery. An emerging solution is to integrate available experimental data into advanced molecular simulations. In the future, molecular simulation in combination with experimental data may be able to offer detailed models of important drug targets such that improved functional mechanisms or selectivity can be accessed.
Collapse
Affiliation(s)
- Benjamin P Cossins
- Computer-Aided Drug Design and Structural Biology, UCB Pharma, Slough, UK.
| | | | - Jiye Shi
- Computer-Aided Drug Design and Structural Biology, UCB Pharma, Slough, UK
| |
Collapse
|
50
|
Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017; 6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]
Abstract
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|