1
|
Tenorio-Salgado S, Villalpando-Aguilar JL, Hernandez-Guerrero R, Poot-Hernández AC, Perez-Rueda E. Exploring the enzymatic repertoires of Bacteria and Archaea and their associations with metabolic maps. Braz J Microbiol 2024:10.1007/s42770-024-01462-3. [PMID: 39052173 DOI: 10.1007/s42770-024-01462-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/11/2024] [Indexed: 07/27/2024] Open
Abstract
The evolution, survival, and adaptation of microbes are consequences of gene duplication, acquisition, and divergence in response to environmental challenges. In this context, enzymes play a central role in the evolution of organisms, because they are fundamental in cell metabolism. Here, we analyzed the enzymatic repertoire in 6,467 microbial genomes, including their abundances, and their associations with metabolic maps. We found that the enzymes follow a power-law distribution, in relation to the genome sizes. Therefore, we evaluated the total proportion enzymatic classes in relation to the genomes, identifying a descending-order proportion: transferases (EC:2.-), hydrolases (EC:3.-), oxidoreductases (EC:1.-), ligases (EC:6.-), lyases (EC:4.-), isomerases (EC:5.-), and translocases (EC:7-.). In addition, we identified a preferential use of enzymatic classes in metabolism pathways for xenobiotics, cofactors and vitamins, carbohydrates, amino acids, glycans, and energy. Therefore, this analysis provides clues about the functional constraints associated with the enzymatic repertoire of functions in Bacteria and Archaea.
Collapse
Affiliation(s)
- Silvia Tenorio-Salgado
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica del Estado de Yucatán, Mérida, Yucatán, México
- Tecnológico Nacional de México, Instituto Tecnológico de Mérida, Av. Tecnológico km. 4.5, 97118, Merida, Yucatan, Mexico
| | - José Luis Villalpando-Aguilar
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica del Estado de Yucatán, Mérida, Yucatán, México
- Facultad Ciencias de la Salud, Universidad Vizcaya de las Américas, Prolongación Allende, Campeche, 24035, Campeche, Mexico
| | - Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica del Estado de Yucatán, Mérida, Yucatán, México
| | - Augusto César Poot-Hernández
- Unidad de Bioinformática y Manejo de la Información. Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Coyoacán, Ciudad de México, México
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica del Estado de Yucatán, Mérida, Yucatán, México.
| |
Collapse
|
2
|
Watterson JG. The cluster model of energy transduction in biological systems. Biosystems 2024; 240:105213. [PMID: 38616011 DOI: 10.1016/j.biosystems.2024.105213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/10/2024] [Accepted: 04/10/2024] [Indexed: 04/16/2024]
Abstract
The central problem in transduction is to explain how the energy caught from sunlight by chloroplasts becomes biological work. Or to express it in different terms: how does the energy remain trapped in the biological network and not get lost through thermalization into the environment? The pathway consists of an immensely large number of steps crossing hierarchical levels - some upwards, to larger assemblies, others downwards into energy rich molecules - before fuelling an action potential or a contracting cell. Accepting the assumption that steps are executed by protein domains, we expect that transduction mechanisms are the result of conformational changes, which in turn involve rearrangements of the bonds responsible for the protein fold. But why are these essential changes so difficult to detect? In this presentation, the metabolic pathway is viewed as equivalent to an energy conduit composed of equally sized units - the protein domains - rather than a row of catalysts. The flow of energy through them occurs by the same mechanism as through the cytoplasmic medium (water). This mechanism is based on the cluster-wave model of water structure, which successfully explains the transfer of energy through the liquid medium responsible for the build up of osmotic pressure. The analogy to the line of balls called "Newton's cradle" provides a useful comparison, since there the transfer is also invisible to us because the intermediate balls are motionless. It is further proposed that the spatial arrangements of the H-bonds of the α and β secondary structures support wave motion, with the linear and lateral forms of the groups of bonds belonging to the helices and sheets executing the longitudinal and transverse modes, respectively.
Collapse
|
3
|
Mughal F, Caetano-Anollés G. Evolution of Intrinsic Disorder in Protein Loops. Life (Basel) 2023; 13:2055. [PMID: 37895436 PMCID: PMC10608553 DOI: 10.3390/life13102055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023] Open
Abstract
Intrinsic disorder accounts for the flexibility of protein loops, molecular building blocks that are largely responsible for the processes and molecular functions of the living world. While loops likely represent early structural forms that served as intermediates in the emergence of protein structural domains, their origin and evolution remain poorly understood. Here, we conduct a phylogenomic survey of disorder in loop prototypes sourced from the ArchDB classification. Tracing prototypes associated with protein fold families along an evolutionary chronology revealed that ancient prototypes tended to be more disordered than their derived counterparts, with ordered prototypes developing later in evolution. This highlights the central evolutionary role of disorder and flexibility. While mean disorder increased with time, a minority of ordered prototypes exist that emerged early in evolutionary history, possibly driven by the need to preserve specific molecular functions. We also revealed the percolation of evolutionary constraints from higher to lower levels of organization. Percolation resulted in trade-offs between flexibility and rigidity that impacted prototype structure and geometry. Our findings provide a deep evolutionary view of the link between structure, disorder, flexibility, and function, as well as insights into the evolutionary role of intrinsic disorder in loops and their contribution to protein structure and function.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
4
|
Aziz MF, Mughal F, Caetano-Anollés G. Tracing the birth of structural domains from loops during protein evolution. Sci Rep 2023; 13:14688. [PMID: 37673948 PMCID: PMC10482863 DOI: 10.1038/s41598-023-41556-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 08/28/2023] [Indexed: 09/08/2023] Open
Abstract
The structures and functions of proteins are embedded into the loop scaffolds of structural domains. Their origin and evolution remain mysterious. Here, we use a novel graph-theoretical approach to describe how modular and non-modular loop prototypes combine to form folded structures in protein domain evolution. Phylogenomic data-driven chronologies reoriented a bipartite network of loops and domains (and its projections) into 'waterfalls' depicting an evolving 'elementary functionome' (EF). Two primordial waves of functional innovation involving founder 'p-loop' and 'winged-helix' domains were accompanied by an ongoing emergence and reuse of structural and functional novelty. Metabolic pathways expanded before translation functionalities. A dual hourglass recruitment pattern transferred scale-free properties from loop to domain components of the EF network in generative cycles of hierarchical modularity. Modeling the evolutionary emergence of the oldest P-loop and winged-helix domains with AlphFold2 uncovered rapid convergence towards folded structure, suggesting that a folding vocabulary exists in loops for protein fold repurposing and design.
Collapse
Affiliation(s)
- M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
5
|
Tang QY, Ren W, Wang J, Kaneko K. The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database. Mol Biol Evol 2022; 39:msac197. [PMID: 36108094 PMCID: PMC9550990 DOI: 10.1093/molbev/msac197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
Collapse
Affiliation(s)
- Qian-Yuan Tang
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0106, Japan
| | - Weitong Ren
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Jun Wang
- School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, People’s Republic of China
| | - Kunihiko Kaneko
- Center for Complex Systems Biology, Universal Biology Institute, University of Tokyo, Komaba, Meguro, Tokyo 153-8902, Japan
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, Copenhagen 2100-DK, Denmark
| |
Collapse
|
6
|
Romei M, Sapriel G, Imbert P, Jamay T, Chomilier J, Lecointre G, Carpentier M. Protein folds as synapomorphies of the tree of life. Evolution 2022; 76:1706-1719. [PMID: 35765784 PMCID: PMC9541633 DOI: 10.1111/evo.14550] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]
Abstract
Several studies showed that folds (topology of protein secondary structures) distribution in proteomes may be a global proxy to build phylogeny. Then, some folds should be synapomorphies (derived characters exclusively shared among taxa). However, previous studies used methods that did not allow synapomorphy identification, which requires congruence analysis of folds as individual characters. Here, we map SCOP folds onto a sample of 210 species across the tree of life (TOL). Congruence is assessed using retention index of each fold for the TOL, and principal component analysis for deeper branches. Using a bicluster mapping approach, we define synapomorphic blocks of folds (SBF) sharing similar presence/absence patterns. Among the 1232 folds, 20% are universally present in our TOL, whereas 54% are reliable synapomorphies. These results are similar with CATH and ECOD databases. Eukaryotes are characterized by a large number of them, and several SBFs clearly support nested eukaryotic clades (divergence times from 1100 to 380 mya). Although clearly separated, the three superkingdoms reveal a strong mosaic pattern. This pattern is consistent with the dual origin of eukaryotes and witness secondary endosymbiosis in their phothosynthetic clades. Our study unveils direct analysis of folds synapomorphies as key characters to unravel evolutionary history of species.
Collapse
Affiliation(s)
- Martin Romei
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHNParisFrance
| | - Guillaume Sapriel
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,UFR des sciences de la santéUniversité Versailles‐St‐QuentinVersaillesFrance
| | - Pierre Imbert
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Théo Jamay
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | | | - Guillaume Lecointre
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| |
Collapse
|
7
|
León-González JA, Flatet P, Juárez-Ramírez MS, Farías-Rico JA. Folding and Evolution of a Repeat Protein on the Ribosome. Front Mol Biosci 2022; 9:851038. [PMID: 35707224 PMCID: PMC9189291 DOI: 10.3389/fmolb.2022.851038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 04/27/2022] [Indexed: 12/04/2022] Open
Abstract
Life on earth is the result of the work of proteins, the cellular nanomachines that fold into elaborated 3D structures to perform their functions. The ribosome synthesizes all the proteins of the biosphere, and many of them begin to fold during translation in a process known as cotranslational folding. In this work we discuss current advances of this field and provide computational and experimental data that highlight the role of ribosome in the evolution of protein structures. First, we used the sequence of the Ankyrin domain from the Drosophila Notch receptor to launch a deep sequence-based search. With this strategy, we found a conserved 33-residue motif shared by different protein folds. Then, to see how the vectorial addition of the motif would generate a full structure we measured the folding on the ribosome of the Ankyrin repeat protein. Not only the on-ribosome folding data is in full agreement with classical in vitro biophysical measurements but also it provides experimental evidence on how folded proteins could have evolved by duplication and fusion of smaller fragments in the RNA world. Overall, we discuss how the ribosomal exit tunnel could be conceptualized as an active site that is under evolutionary pressure to influence protein folding.
Collapse
Affiliation(s)
- José Alberto León-González
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - Perline Flatet
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - María Soledad Juárez-Ramírez
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - José Arcadio Farías-Rico
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
- *Correspondence: José Arcadio Farías-Rico,
| |
Collapse
|
8
|
The Legend of ATP: From Origin of Life to Precision Medicine. Metabolites 2022; 12:metabo12050461. [PMID: 35629965 PMCID: PMC9148104 DOI: 10.3390/metabo12050461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 02/05/2023] Open
Abstract
Adenosine triphosphate (ATP) may be the most important biological small molecule. Since it was discovered in 1929, ATP has been regarded as life’s energy reservoir. However, this compound means more to life. Its legend starts at the dawn of life and lasts to this day. ATP must be the basic component of ancient ribozymes and may facilitate the origin of structured proteins. In the existing organisms, ATP continues to construct ribonucleic acid (RNA) and work as a protein cofactor. ATP also functions as a biological hydrotrope, which may keep macromolecules soluble in the primitive environment and can regulate phase separation in modern cells. These functions are involved in the pathogenesis of aging-related diseases and breast cancer, providing clues to discovering anti-aging agents and precision medicine tactics for breast cancer.
Collapse
|
9
|
Bzówka M, Mitusińska K, Raczyńska A, Skalski T, Samol A, Bagrowska W, Magdziarz T, Góra A. Evolution of tunnels in α/β-hydrolase fold proteins—What can we learn from studying epoxide hydrolases? PLoS Comput Biol 2022; 18:e1010119. [PMID: 35580137 PMCID: PMC9140254 DOI: 10.1371/journal.pcbi.1010119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 05/27/2022] [Accepted: 04/19/2022] [Indexed: 12/27/2022] Open
Abstract
The evolutionary variability of a protein’s residues is highly dependent on protein region and function. Solvent-exposed residues, excluding those at interaction interfaces, are more variable than buried residues whereas active site residues are considered to be conserved. The abovementioned rules apply also to α/β-hydrolase fold proteins—one of the oldest and the biggest superfamily of enzymes with buried active sites equipped with tunnels linking the reaction site with the exterior. We selected soluble epoxide hydrolases as representative of this family to conduct the first systematic study on the evolution of tunnels. We hypothesised that tunnels are lined by mostly conserved residues, and are equipped with a number of specific variable residues that are able to respond to evolutionary pressure. The hypothesis was confirmed, and we suggested a general and detailed way of the tunnels’ evolution analysis based on entropy values calculated for tunnels’ residues. We also found three different cases of entropy distribution among tunnel-lining residues. These observations can be applied for protein reengineering mimicking the natural evolution process. We propose a ‘perforation’ mechanism for new tunnels design via the merging of internal cavities or protein surface perforation. Based on the literature data, such a strategy of new tunnel design could significantly improve the enzyme’s performance and can be applied widely for enzymes with buried active sites. So far very little is known about proteins tunnels evolution. The goal of this study is to evaluate the evolution of tunnels in the family of soluble epoxide hydrolases—representatives of numerous α/β-hydrolase fold enzymes. As a result two types of tunnels evolution analysis were proposed (a general and a detailed approach), as well as a ‘perforation’ mechanism which can mimic native evolution in proteins and can be used as an additional strategy for enzymes redesign.
Collapse
Affiliation(s)
- Maria Bzówka
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Karolina Mitusińska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Agata Raczyńska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Skalski
- Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Aleksandra Samol
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Weronika Bagrowska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Magdziarz
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Artur Góra
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| |
Collapse
|
10
|
Fried SD, Fujishima K, Makarov M, Cherepashuk I, Hlouchova K. Peptides before and during the nucleotide world: an origins story emphasizing cooperation between proteins and nucleic acids. J R Soc Interface 2022; 19:20210641. [PMID: 35135297 PMCID: PMC8833103 DOI: 10.1098/rsif.2021.0641] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 01/05/2022] [Indexed: 12/14/2022] Open
Abstract
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative 'peptide-polynucleotide stage'. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
Collapse
Affiliation(s)
- Stephen D. Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, MD 21212, USA
- Department of Biophysics, Johns Hopkins University, Baltimore, MD 21212, USA
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Graduate School of Media and Governance, Keio University, Fujisawa 2520882, Japan
| | - Mikhail Makarov
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Ivan Cherepashuk
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
11
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
12
|
Caetano-Anollés G. The Compressed Vocabulary of Microbial Life. Front Microbiol 2021; 12:655990. [PMID: 34305827 PMCID: PMC8292947 DOI: 10.3389/fmicb.2021.655990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, United States
| |
Collapse
|
13
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
14
|
Bogdan P, Caetano-Anollés G, Jolles A, Kim H, Morris J, Murphy CA, Royer C, Snell EH, Steinbrenner A, Strausfeld N. Biological networks across scales. Integr Comp Biol 2021; 61:1991-2010. [PMID: 34021749 DOI: 10.1093/icb/icab069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Many biological systems across scales of size and complexity exhibit a time-varying complex network structure that emerges and self-organizes as a result of interactions with the environment. Network interactions optimize some intrinsic cost functions that are unknown and involve for example energy efficiency, robustness, resilience, and frailty. A wide range of networks exist in biology, from gene regulatory networks important for organismal development, protein interaction networks that govern physiology and metabolism, and neural networks that store and convey information to networks of microbes that form microbiomes within hosts, animal contact networks that underlie social systems, and networks of populations on the landscape connected by migration. Increasing availability of extensive (big) data is amplifying our ability to quantify biological networks. Similarly, theoretical methods that describe network structure and dynamics are being developed. Beyond static networks representing snapshots of biological systems, collections of longitudinal data series can help either at defining and characterizing network dynamics over time or analyzing the dynamics constrained to networked architectures. Moreover, due to interactions with the environment and other biological systems, a biological network may not be fully observable. Also, subnetworks may emerge and disappear as a result of the need for the biological system to cope with for example invaders or new information flows. The confluence of these developments renders tractable the question of how the structure of biological networks predicts and controls network dynamics. In particular, there may be structural features that result in homeostatic networks with specific higher-order statistics (e.g., multifractal spectrum), which maintain stability over time through robustness and/or resilience to perturbation. Alternative, plastic networks may respond to perturbation by (adaptive to catastrophic) shifts in structure. Here, we explore the opportunity for discovering universal laws connecting the structure of biological networks with their function, positioning them on the spectrum of time-evolving network structure, i.e. dynamics of networks, from highly stable to exquisitely sensitive to perturbation. If such general laws exist, they could transform our ability to predict the response of biological systems to perturbations-an increasingly urgent priority in the face of anthropogenic changes to the environment that affect life across the gamut of organizational scales.
Collapse
Affiliation(s)
- Paul Bogdan
- Ming-Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles
| | | | - Anna Jolles
- Department of Integrative Biology, Oregon State University, Corvallis
| | - Hyunju Kim
- The Beyond Center, Arizona State University, Tempe
| | - James Morris
- Baruch Institute for Marine and Coastal Sciences, University of South Carolina, Columbia
| | - Cheryl A Murphy
- Department of Fisheries and Wildlife, Michigan State University, East Lansing
| | | | - Edward H Snell
- Hauptman-Woodward Medical Research Institute and SUNY, Buffalo
| | | | | |
Collapse
|
15
|
Heizinger L, Merkl R. Evidence for the preferential reuse of sub-domain motifs in primordial protein folds. Proteins 2021; 89:1167-1179. [PMID: 33957009 DOI: 10.1002/prot.26089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 04/15/2021] [Accepted: 04/28/2021] [Indexed: 11/06/2022]
Abstract
A comparison of protein backbones makes clear that not more than approximately 1400 different folds exist, each specifying the three-dimensional topology of a protein domain. Large proteins are composed of specific domain combinations and many domains can accommodate different functions. These findings confirm that the reuse of domains is key for the evolution of multi-domain proteins. If reuse was also the driving force for domain evolution, ancestral fragments of sub-domain size exist that are shared between domains possessing significantly different topologies. For the fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies. Interestingly, motifs are enriched in topologies that are considered ancestral. Thus, our findings suggest that the reuse of sub-domain sized fragments was relevant in early phases of protein evolution and became less important later on.
Collapse
Affiliation(s)
- Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| |
Collapse
|
16
|
Goldman AD, Kacar B. Cofactors are Remnants of Life's Origin and Early Evolution. J Mol Evol 2021; 89:127-133. [PMID: 33547911 PMCID: PMC7982383 DOI: 10.1007/s00239-020-09988-4] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 12/21/2020] [Indexed: 12/22/2022]
Abstract
The RNA World is one of the most widely accepted hypotheses explaining the origin of the genetic system used by all organisms today. It proposes that the tripartite system of DNA, RNA, and proteins was preceded by one consisting solely of RNA, which both stored genetic information and performed the molecular functions encoded by that genetic information. Current research into a potential RNA World revolves around the catalytic properties of RNA-based enzymes, or ribozymes. Well before the discovery of ribozymes, Harold White proposed that evidence for a precursor RNA world could be found within modern proteins in the form of coenzymes, the majority of which contain nucleobases or nucleoside moieties, such as Coenzyme A and S-adenosyl methionine, or are themselves nucleotides, such as ATP and NADH (a dinucleotide). These coenzymes, White suggested, had been the catalytic active sites of ancient ribozymes, which transitioned to their current forms after the surrounding ribozyme scaffolds had been replaced by protein apoenzymes during the evolution of translation. Since its proposal four decades ago, this groundbreaking hypothesis has garnered support from several different research disciplines and motivated similar hypotheses about other classes of cofactors, most notably iron-sulfur cluster cofactors as remnants of the geochemical setting of the origin of life. Evidence from prebiotic geochemistry, ribozyme biochemistry, and evolutionary biology, increasingly supports these hypotheses. Certain coenzymes and cofactors may bridge modern biology with the past and can thus provide insights into the elusive and poorly-recorded period of the origin and early evolution of life.
Collapse
Affiliation(s)
- Aaron D Goldman
- Department of Biology, Oberlin College and Conservatory, Oberlin, OH, 44074, USA. .,Blue Marble Space Institute of Science, Seattle, WA, 98154, USA.
| | - Betul Kacar
- Blue Marble Space Institute of Science, Seattle, WA, 98154, USA. .,Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA. .,Lunar and Planetary Laboratory and Department of Astronomy, University of Arizona, Tucson, AZ, 85721, USA. .,Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, 152-8550, Japan.
| |
Collapse
|
17
|
A highly efficient protein degradation system in Bacillus sp. CN2: a functional-degradomics study. Appl Microbiol Biotechnol 2021; 105:707-723. [PMID: 33386896 DOI: 10.1007/s00253-020-11083-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/07/2020] [Accepted: 12/27/2020] [Indexed: 10/22/2022]
Abstract
A novel protease-producing Bacillus sp. CN2 isolated from chicken manure composts exhibited a relatively high proteolytic specific activity. The strain CN2 degradome consisted of at least 149 proteases and homolog candidates, which were distributed into 4 aspartic, 30 cysteine, 55 metallo, 56 serine, and 4 threonine proteases. Extracellular proteolytic activity was almost completely inhibited by PMSF (phenylmethylsulfonyl fluoride) rather than o-P, E-64, or pepstatin A, suggesting that strain CN2 primarily secreted serine protease. More importantly, analysis of the extracellular proteome of strain CN2 revealed the presence of a highly efficient protein degradation system. Three serine proteases of the S8 family with different active site architectures firstly fragmented protein substrates which were then degraded to smaller peptides by a M4 metalloendopeptidase that prefers to degrade hydrophobic peptides and by a S13 carboxypeptidase. Those enzymes acted synergistically to degrade intact substrate proteins outside the cell. Furthermore, highly expressed sequence-specific intracellular aminopeptidases from multiple families (M20, M29, and M42) accurately degraded peptides into oligopeptides or amino acids, thus realizing the rapid acquisition and utilization of nitrogen sources. In this paper, a systematic study of the functional-degradome provided a new perspective for understanding the complexity of the protease hydrolysis system of Bacillus, and laid a solid foundation for further studying the precise degradation of proteins with the cooperative action of different family proteases. KEY POINTS: • Bacillus sp. CN2 has relatively high proteolytic specific activity. • Bacillus sp. CN2 harbors a highly efficient protein degradation system. • The site-specific endopeptidases were secreted extracellular, while the sequence-specific aminopeptidases played a role in the cell.
Collapse
|
18
|
Nasir A, Romero-Severson E, Claverie JM. Investigating the Concept and Origin of Viruses. Trends Microbiol 2020; 28:959-967. [PMID: 33158732 PMCID: PMC7609044 DOI: 10.1016/j.tim.2020.08.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/25/2020] [Accepted: 08/27/2020] [Indexed: 12/21/2022]
Abstract
The ongoing COVID-19 pandemic has piqued public interest in the properties, evolution, and emergence of viruses. Here, we discuss how these basic questions have surprisingly remained disputed despite being increasingly within the reach of scientific analysis. We review recent data-driven efforts that shed light into the origin and evolution of viruses and explain factors that resist the widespread acceptance of new views and insights. We propose a new definition of viruses that is not restricted to the presence or absence of any genetic or physical feature, detail a scenario for how viruses likely originated from ancient cells, and explain technical and conceptual biases that limit our understanding of virus evolution. We note that the philosophical aspects of virus evolution also impact the way we might prepare for future outbreaks.
Collapse
Affiliation(s)
- Arshan Nasir
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Ethan Romero-Severson
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jean-Michel Claverie
- Aix Marseille University, CNRS, IGS, Structural and Genomic Information Laboratory (UMR7256), Mediterranean Institute of Microbiology (FR3479), Marseille, France
| |
Collapse
|
19
|
Ros E, Torres AG, Ribas de Pouplana L. Learning from Nature to Expand the Genetic Code. Trends Biotechnol 2020; 39:460-473. [PMID: 32896440 DOI: 10.1016/j.tibtech.2020.08.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 01/14/2023]
Abstract
The genetic code is the manual that cells use to incorporate amino acids into proteins. It is possible to artificially expand this manual through cellular, molecular, and chemical manipulations to improve protein functionality. Strategies for in vivo genetic code expansion are under the same functional constraints as natural protein synthesis. Here, we review the approaches used to incorporate noncanonical amino acids (ncAAs) into designer proteins through the manipulation of the translation machinery and draw parallels between these methods and natural adaptations that improve translation in extant organisms. Following this logic, we propose new nature-inspired tactics to improve genetic code expansion (GCE) in synthetic organisms.
Collapse
Affiliation(s)
- Enric Ros
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, 08028, Spain
| | - Adrian Gabriel Torres
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, 08028, Spain
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, 08028, Spain; Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia, 08010, Spain.
| |
Collapse
|
20
|
Chu XY, Zhang HY. Cofactors as Molecular Fossils To Trace the Origin and Evolution of Proteins. Chembiochem 2020; 21:3161-3168. [PMID: 32515532 DOI: 10.1002/cbic.202000027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 06/03/2020] [Indexed: 12/16/2022]
Abstract
Due to their early origin and extreme conservation, cofactors are valuable molecular fossils for tracing the origin and evolution of proteins. First, as the order of protein folds binding with cofactors roughly coincides with protein-fold chronology, cofactors are considered to have facilitated the origin of primitive proteins by selecting them from pools of random amino acid sequences. Second, in the subsequent evolution of proteins, cofactors still played an important role. More interestingly, as metallic cofactors evolved with geochemical variations, some geochemical events left imprints in the chronology of protein architecture; this provides further evidence supporting the coevolution of biochemistry and geochemistry. In this paper, we attempt to review the molecular fossils used in tracing the origin and evolution of proteins, with a special focus on cofactors.
Collapse
Affiliation(s)
- Xin-Yi Chu
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
21
|
Prechl J. Network Organization of Antibody Interactions in Sequence and Structure Space: the RADARS Model. Antibodies (Basel) 2020; 9:antib9020013. [PMID: 32384800 PMCID: PMC7345901 DOI: 10.3390/antib9020013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/09/2020] [Accepted: 04/15/2020] [Indexed: 02/06/2023] Open
Abstract
Adaptive immunity in vertebrates is a complex self-organizing network of molecular interactions. While deep sequencing of the immune-receptor repertoire may reveal clonal relationships, functional interpretation of such data is hampered by the inherent limitations of converting sequence to structure to function. In this paper, a novel model of antibody interaction space and network, termed radial adjustment of system resolution, RAdial ADjustment of System Resolution (RADARS), is proposed. The model is based on the radial growth of interaction affinity of antibodies towards an infinity of directions in structure space, each direction corresponding to particular shapes of antigen epitopes. Levels of interaction affinity appear as free energy shells of the system, where hierarchical B-cell development and differentiation takes place. Equilibrium in this immunological thermodynamic system can be described by a power law distribution of antibody-free energies with an ideal network degree exponent of phi square, representing a scale-free fractal network of antibody interactions. Plasma cells are network hubs, memory B cells are nodes with intermediate degrees, and B1 cells function as nodes with minimal degree. Overall, the RADARS model implies that a finite number of antibody structures can interact with an infinite number of antigens by immunologically controlled adjustment of interaction energy distribution. Understanding quantitative network properties of the system should help the organization of sequence-derived predicted structural data.
Collapse
Affiliation(s)
- József Prechl
- Diagnosticum Zrt., 126. Attila u., 1047 Budapest, Hungary
| |
Collapse
|
22
|
Bateman A. Division of labour in a matrix, rather than phagocytosis or endosymbiosis, as a route for the origin of eukaryotic cells. Biol Direct 2020; 15:8. [PMID: 32345370 PMCID: PMC7187495 DOI: 10.1186/s13062-020-00260-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 02/25/2020] [Indexed: 12/13/2022] Open
Abstract
Abstract Two apparently irreconcilable models dominate research into the origin of eukaryotes. In one model, amitochondrial proto-eukaryotes emerged autogenously from the last universal common ancestor of all cells. Proto-eukaryotes subsequently acquired mitochondrial progenitors by the phagocytic capture of bacteria. In the second model, two prokaryotes, probably an archaeon and a bacterial cell, engaged in prokaryotic endosymbiosis, with the species resident within the host becoming the mitochondrial progenitor. Both models have limitations. A search was therefore undertaken for alternative routes towards the origin of eukaryotic cells. The question was addressed by considering classes of potential pathways from prokaryotic to eukaryotic cells based on considerations of cellular topology. Among the solutions identified, one, called here the “third-space model”, has not been widely explored. A version is presented in which an extracellular space (the third-space), serves as a proxy cytoplasm for mixed populations of archaea and bacteria to “merge” as a transitionary complex without obligatory endosymbiosis or phagocytosis and to form a precursor cell. Incipient nuclei and mitochondria diverge by division of labour. The third-space model can accommodate the reorganization of prokaryote-like genomes to a more eukaryote-like genome structure. Nuclei with multiple chromosomes and mitosis emerge as a natural feature of the model. The model is compatible with the loss of archaeal lipid biochemistry while retaining archaeal genes and provides a route for the development of membranous organelles such as the Golgi apparatus and endoplasmic reticulum. Advantages, limitations and variations of the “third-space” models are discussed. Reviewers This article was reviewed by Damien Devos, Buzz Baum and Michael Gray.
Collapse
Affiliation(s)
- Andrew Bateman
- Division of Experimental Medicine, Department of Medicine, McGill University, Glen Site Pavilion E, 1001 Boulevard Decarie, Montreal, Quebec, H4A 3J1, Canada. .,Centre for Translational Biology, Research Institute of McGill University Health Centre, Glen Site Pavilion E, 1001 Boulevard Decarie, Montreal, Quebec, H4A 3J1, Canada.
| |
Collapse
|
23
|
Narunsky A, Kessel A, Solan R, Alva V, Kolodny R, Ben-Tal N. On the evolution of protein-adenine binding. Proc Natl Acad Sci U S A 2020; 117:4701-4709. [PMID: 32079721 PMCID: PMC7060716 DOI: 10.1073/pnas.1911349117] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Proteins' interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein-adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein-adenine interactions in the Watson-Crick edge of adenine and shows that all of adenine's edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments ("themes") that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.
Collapse
Affiliation(s)
- Aya Narunsky
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Amit Kessel
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Ron Solan
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, 3498838 Haifa, Israel
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel;
| |
Collapse
|
24
|
Preiner M, Asche S, Becker S, Betts HC, Boniface A, Camprubi E, Chandru K, Erastova V, Garg SG, Khawaja N, Kostyrka G, Machné R, Moggioli G, Muchowska KB, Neukirchen S, Peter B, Pichlhöfer E, Radványi Á, Rossetto D, Salditt A, Schmelling NM, Sousa FL, Tria FDK, Vörös D, Xavier JC. The Future of Origin of Life Research: Bridging Decades-Old Divisions. Life (Basel) 2020; 10:E20. [PMID: 32110893 PMCID: PMC7151616 DOI: 10.3390/life10030020] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 02/19/2020] [Accepted: 02/21/2020] [Indexed: 12/12/2022] Open
Abstract
Research on the origin of life is highly heterogeneous. After a peculiar historical development, it still includes strongly opposed views which potentially hinder progress. In the 1st Interdisciplinary Origin of Life Meeting, early-career researchers gathered to explore the commonalities between theories and approaches, critical divergence points, and expectations for the future. We find that even though classical approaches and theories-e.g. bottom-up and top-down, RNA world vs. metabolism-first-have been prevalent in origin of life research, they are ceasing to be mutually exclusive and they can and should feed integrating approaches. Here we focus on pressing questions and recent developments that bridge the classical disciplines and approaches, and highlight expectations for future endeavours in origin of life research.
Collapse
Affiliation(s)
- Martina Preiner
- Institute of Molecular Evolution, University of Düsseldorf, 40225 Düsseldorf, Germany; (S.G.G.); (F.D.K.T.)
| | - Silke Asche
- School of Chemistry, University of Glasgow, Glasgow G128QQ, UK;
| | - Sidney Becker
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK;
| | - Holly C. Betts
- School of Earth Sciences, University of Bristol, Bristol BS8 1RL, UK;
| | - Adrien Boniface
- Environmental Microbial Genomics, Laboratoire Ampère, Ecole Centrale de Lyon, Université de Lyon, 69130 Ecully, France;
| | - Eloi Camprubi
- Origins Center, Department of Earth Sciences, Utrecht University, 3584 CB Utrecht, The Netherlands;
| | - Kuhan Chandru
- Space Science Center (ANGKASA), Institute of Climate Change, Level 3, Research Complex, National University of Malaysia, UKM Bangi 43600, Selangor, Malaysia;
- Department of Physical Chemistry, University of Chemistry and Technology, Prague, Technicka 5, 16628 Prague 6–Dejvice, Czech Republic
| | - Valentina Erastova
- UK Centre for Astrobiology, School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, UK;
| | - Sriram G. Garg
- Institute of Molecular Evolution, University of Düsseldorf, 40225 Düsseldorf, Germany; (S.G.G.); (F.D.K.T.)
| | - Nozair Khawaja
- Institut für Geologische Wissenschaften, Freie Universität Berlin, 12249 Berlin, Germany;
| | | | - Rainer Machné
- Institute of Synthetic Microbiology, University of Düsseldorf, 40225 Düsseldorf, Germany; (R.M.); (N.M.S.)
- Quantitative and Theoretical Biology, University of Düsseldorf, 40225 Düsseldorf, Germany
| | - Giacomo Moggioli
- School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4DQ, UK;
| | - Kamila B. Muchowska
- Université de Strasbourg, CNRS, ISIS, 8 allée Gaspard Monge, 67000 Strasbourg, France;
| | - Sinje Neukirchen
- Archaea Biology and Ecogenomics Division, University of Vienna, 1090 Vienna, Austria; (S.N.); (E.P.); (F.L.S.)
| | - Benedikt Peter
- Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany;
| | - Edith Pichlhöfer
- Archaea Biology and Ecogenomics Division, University of Vienna, 1090 Vienna, Austria; (S.N.); (E.P.); (F.L.S.)
| | - Ádám Radványi
- Department of Plant Systematics, Ecology and Theoretical Biology, Eötvös Loránd University, Pázmány Péter sétány 1/C, 1117 Budapest, Hungary (D.V.)
- Institute of Evolution, MTA Centre for Ecological Research, Klebelsberg Kuno u. 3., H-8237 Tihany, Hungary
| | - Daniele Rossetto
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy;
| | - Annalena Salditt
- Systems Biophysics, Physics Department, Ludwig-Maximilians-Universität München, 80799 Munich, Germany;
| | - Nicolas M. Schmelling
- Institute of Synthetic Microbiology, University of Düsseldorf, 40225 Düsseldorf, Germany; (R.M.); (N.M.S.)
- Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, 50674 Cologne, Germany
| | - Filipa L. Sousa
- Archaea Biology and Ecogenomics Division, University of Vienna, 1090 Vienna, Austria; (S.N.); (E.P.); (F.L.S.)
| | - Fernando D. K. Tria
- Institute of Molecular Evolution, University of Düsseldorf, 40225 Düsseldorf, Germany; (S.G.G.); (F.D.K.T.)
| | - Dániel Vörös
- Department of Plant Systematics, Ecology and Theoretical Biology, Eötvös Loránd University, Pázmány Péter sétány 1/C, 1117 Budapest, Hungary (D.V.)
- Institute of Evolution, MTA Centre for Ecological Research, Klebelsberg Kuno u. 3., H-8237 Tihany, Hungary
| | - Joana C. Xavier
- Institute of Molecular Evolution, University of Düsseldorf, 40225 Düsseldorf, Germany; (S.G.G.); (F.D.K.T.)
| |
Collapse
|
25
|
Hernandez-Guerrero R, Galán-Vásquez E, Pérez-Rueda E. The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains. PLoS One 2019; 14:e0226604. [PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/29/2019] [Indexed: 11/19/2022] Open
Abstract
In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.
Collapse
Affiliation(s)
- Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Ciudad Universitaria, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Ernesto Pérez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- * E-mail:
| |
Collapse
|
26
|
Pollack JD, Gerard D, Makhatadze GI, Pearl DK. Evolutionary conservation and structural localizations suggest a physical trace of metabolism’s progressive geochronological emergence. J Biomol Struct Dyn 2019; 38:3700-3719. [DOI: 10.1080/07391102.2019.1679666] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- J. Dennis Pollack
- Department of Molecular Virology, Immunology and Medical Genetics, College of Medicine, The Ohio State University, Columbus, Ohio, USA
| | - David Gerard
- Department of Mathematics and Statistics, American University, Washington, DC, USA
| | - George I. Makhatadze
- Department of Biological Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Dennis K. Pearl
- Department of Statistics, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
27
|
Jin C, Cukier RI. Machine learning can be used to distinguish protein families and generate new proteins belonging to those families. J Chem Phys 2019; 151:175102. [PMID: 31703505 DOI: 10.1063/1.5126225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Proteins are classified into families based on evolutionary relationships and common structure-function characteristics. Availability of large data sets of gene-derived protein sequences drives this classification. Sequence space is exponentially large, making it difficult to characterize family differences. In this work, we show that Machine Learning (ML) methods can be trained to distinguish between protein families. A number of supervised ML algorithms are explored to this end. The most accurate is a Long Short Term Memory (LSTM) classification method that accounts for the sequence context of the amino acids. Sequences for a number of protein families where there are sufficient data to be used in ML are studied. By splitting the data into training and testing sets, we find that this LSTM classifier can be trained to successfully classify the test sequences for all pairs of the families. Also investigated is whether the addition of structural information increases the accuracy of the binary comparisons. It does, but because there is much less available structural than sequence information, the quality of the training degrades. Another variety of LSTM, LSTM_wordGen, a context-dependent word generation algorithm, is used to generate new protein sequences based on seed sequences for the families considered here. Using the original sequences as training data and the generated sequences as test data, the LSTM classification method classifies the generated sequences almost as accurately as the true family members do. Thus, in principle, we have generated new members of these protein families.
Collapse
Affiliation(s)
- Chi Jin
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, USA
| | - Robert I Cukier
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
28
|
Mughal F, Caetano-Anollés G. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 2019; 14:e0224201. [PMID: 31648227 PMCID: PMC6812854 DOI: 10.1371/journal.pone.0224201] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open
Abstract
Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database, an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. The database, which was introduced in 2006, traces the deep history of the structural domains of enzymes in metabolic pathways. Here we release version 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. MANET is available at http://manet.illinois.edu.
Collapse
Affiliation(s)
- Fizza Mughal
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
29
|
Milner-White EJ. Protein three-dimensional structures at the origin of life. Interface Focus 2019; 9:20190057. [PMID: 31641431 DOI: 10.1098/rsfs.2019.0057] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2019] [Indexed: 12/22/2022] Open
Abstract
Proteins are relatively easy to synthesize, compared to nucleic acids and it is likely that there existed a stage prior to the RNA world which can be called the protein world. Some of the three-dimensional (3D) peptide structures in these proteins have, we argue, been conserved since then and may constitute the oldest biological relics in existence. We focus on 3D peptide motifs consisting of up to eight or so amino acid residues. The best known of these is the 'nest', a three- to seven-residue protein motif, which has the function of binding anionic atoms or groups of atoms. Ten per cent of amino acids in typical proteins belong to a nest, so it is a common motif. A five-residue nest is found as part of the well-known P-loop that is a recurring feature of many ATP or GTP-binding proteins and it has the function of binding the phosphate part of these ligands. A synthetic hexapeptide, ser-gly-ala-gly-lys-thr, designed to resemble the P-loop, has been shown to bind inorganic phosphate. Another type of nest binds iron-sulfur centres. A range of other simple motifs occur with various intriguing 3D structures; others bind cations or form channels that transport potassium ions; other peptides form catalytically active haem-like or sheet structures with certain transition metals. Amyloid peptides are also discussed. It now seems that the earliest polypeptides were far from being functionless stretches, and had many of the properties, both binding and catalytic, that might be expected to encourage and stabilize simple life forms in the hydrothermal vents of ocean depths.
Collapse
Affiliation(s)
- E James Milner-White
- Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G128QQ, UK
| |
Collapse
|
30
|
Gangele K, Jamsandekar M, Mishra A, Poluri KM. Unraveling the evolutionary origin of ELR motif using fish CXC chemokine CXCL8. FISH & SHELLFISH IMMUNOLOGY 2019; 93:17-27. [PMID: 31310848 DOI: 10.1016/j.fsi.2019.07.034] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 07/12/2019] [Accepted: 07/12/2019] [Indexed: 05/19/2023]
Abstract
Chemokines are chemotactic proteins involved in host defense through the migration of immune-regulatory cells to the site of infection. Interleukin-8 (CXCL8/IL8) is the most studied "ELR-CXC chemokine/neutrophil activating chemokine (NAC) that regulate neutrophil trafficking during infections and inflammation by binding to its cognate G-protein coupled receptors CXCR1/CXCR2. The "ELR" motif of NAC chemokines is essential for the CXCR1/CXCR2 receptor activation. In order to understand the evolutionary origin of "ELR" motif in the CXC chemokines, a thorough evolutionary study of CXCL8 gene from various fishes and primates was performed. Phylogenetic analysis revealed that the CXCL8 gene can be classified into four distinct lineages (CXCL8-L1a, CXCL8-L1b, CXCL8-L2, and CXCL8-L3), where CXCL8-L1a is the fastest evolving lineage and CXCL8-L3 is the slowest. Selection analysis suggested that The "ELR/DLR" motif containing branches (gadoid and coelacanth) are positively selected. The probable evolutionary trend of "ELR" motif suggested that this motif in ancestor CXCL8 is evolved from the GGR of Lamprey (Agnatha), followed by duplication giving rise to two main motifs in CXCL8 "NXH" in L3 lineage and "ELR/DLR" in L1a/L1b lineages. Although, structural analysis suggested that the overall topology of the CXCL8 proteins is similar, differences do exist at the individual structural elements among the members of different lineages. Functional distance analysis suggested that the CXCL8-L3 lineage is more distant compared to the CXCL8-L1a and L1b lineages from the inferred ancestor. Functional divergence analysis between different lineages suggested that most of the selected residues are important for receptor or glycosaminoglycan binding. Such a functional diversification can be attributed to the novel set of functions adopted by CXCL8 in various species.
Collapse
Affiliation(s)
- Krishnakant Gangele
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India
| | - Minal Jamsandekar
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India
| | - Amit Mishra
- Cellular and Molecular Neurobiology Unit, Indian Institute of Technology Jodhpur, Jodhpur, 342011, Rajasthan, India
| | - Krishna Mohan Poluri
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India.
| |
Collapse
|
31
|
Caetano-Anollés G, Aziz MF, Mughal F, Gräter F, Koç I, Caetano-Anollés K, Caetano-Anollés D. Emergence of Hierarchical Modularity in Evolving Networks Uncovered by Phylogenomic Analysis. Evol Bioinform Online 2019; 15:1176934319872980. [PMID: 31523127 PMCID: PMC6728656 DOI: 10.1177/1176934319872980] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 08/08/2019] [Indexed: 01/15/2023] Open
Abstract
Networks describe how parts associate with each other to form integrated systems which often have modular and hierarchical structure. In biology, network growth involves two processes, one that unifies and the other that diversifies. Here, we propose a biphasic (bow-tie) theory of module emergence. In the first phase, parts are at first weakly linked and associate variously. As they diversify, they compete with each other and are often selected for performance. The emerging interactions constrain their structure and associations. This causes parts to self-organize into modules with tight linkage. In the second phase, variants of the modules diversify and become new parts for a new generative cycle of higher level organization. The paradigm predicts the rise of hierarchical modularity in evolving networks at different timescales and complexity levels. Remarkably, phylogenomic analyses uncover this emergence in the rewiring of metabolomic and transcriptome-informed metabolic networks, the nanosecond dynamics of proteins, and evolving networks of metabolism, elementary functionomes, and protein domain organization.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - Frauke Gräter
- Heidelberg Institute for Theoretical
Studies, Heidelberg, Germany
| | - Ibrahim Koç
- Department of Molecular Biology and
Genetics, Gebze Technical University, Gebze, Turkey
| | - Kelsey Caetano-Anollés
- Division of Biomedical Informatics,
College of Medicine, Seoul National University, Seoul, Republic of Korea
| | | |
Collapse
|
32
|
Solis AD. Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds. BMC Evol Biol 2019; 19:158. [PMID: 31362700 PMCID: PMC6668081 DOI: 10.1186/s12862-019-1464-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 06/19/2019] [Indexed: 11/10/2022] Open
Abstract
Background There is wide agreement that only a subset of the twenty standard amino acids existed prebiotically in sufficient concentrations to form functional polypeptides. We ask how this subset, postulated as {A,D,E,G,I,L,P,S,T,V}, could have formed structures stable enough to found metabolic pathways. Inspired by alphabet reduction experiments, we undertook a computational analysis to measure the structural coding behavior of sequences simplified by reduced alphabets. We sought to discern characteristics of the prebiotic set that would endow it with unique properties relevant to structure, stability, and folding. Results Drawing on a large dataset of single-domain proteins, we employed an information-theoretic measure to assess how well the prebiotic amino acid set preserves fold information against all other possible ten-amino acid sets. An extensive virtual mutagenesis procedure revealed that the prebiotic set excellently preserves sequence-dependent information regarding both backbone conformation and tertiary contact matrix of proteins. We observed that information retention is fold-class dependent: the prebiotic set sufficiently encodes the structure space of α/β and α + β folds, and to a lesser extent, of all-α and all-β folds. The prebiotic set appeared insufficient to encode the small proteins. Assessing how well the prebiotic set discriminates native vs. incorrect sequence-structure matches, we found that α/β and α + β folds exhibit more pronounced energy gaps with the prebiotic set than with nearly all alternatives. Conclusions The prebiotic set optimally encodes local backbone structures that appear in the folded environment and near-optimally encodes the tertiary contact matrix of extant proteins. The fold-class-specific patterns observed from our structural analysis confirm the postulated timeline of fold appearance in proteogenesis derived from proteomic sequence analyses. Polypeptides arising in a prebiotic environment will likely form α/β and α + β-like folds if any at all. We infer that the progressive expansion of the alphabet allowed the increased conformational stability and functional specificity of later folds, including all-α, all-β, and small proteins. Our results suggest that prebiotic sequences are amenable to mutations that significantly lower native conformational energies and increase discrimination amidst incorrect folds. This property may have assisted the genesis of functional proto-enzymes prior to the expansion of the full amino acid alphabet.
Collapse
Affiliation(s)
- Armando D Solis
- Biological Sciences Department, New York City College of Technology (City Tech), The City University of New York (CUNY), 285 Jay Street, Brooklyn, NY, 11201, USA.
| |
Collapse
|
33
|
The Hsp70 Chaperone System Stabilizes a Thermo-sensitive Subproteome in E. coli. Cell Rep 2019; 28:1335-1345.e6. [DOI: 10.1016/j.celrep.2019.06.081] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 06/01/2019] [Accepted: 06/21/2019] [Indexed: 01/05/2023] Open
|
34
|
Identification of functional signatures in the metabolism of the three cellular domains of life. PLoS One 2019; 14:e0217083. [PMID: 31136618 PMCID: PMC6538242 DOI: 10.1371/journal.pone.0217083] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 05/04/2019] [Indexed: 11/19/2022] Open
Abstract
In order to identify common and specific enzymatic activities associated with the metabolism of the three cellular domains of life, the conservation and variations between the enzyme contents of Bacteria, Archaea, and Eukarya organisms were evaluated. To this end, the content of enzymes belonging to a particular pathway and their abundance and distribution in 1507 organisms that have been annotated and deposited in the KEGG database were assessed. In addition, we evaluated the consecutive enzymatic reaction pairs obtained from metabolic pathway reactions and transformed into sequences of enzymatic reactions, with catalytic activities encoded in the Enzyme Commission numbers, which are linked by a substrate. Both analyses are complementary: the first considers individual reactions associated with each organism and metabolic map, and the second evaluates the functional associations between pairs of consecutive reactions. From these comparisons, we found a set of five enzymatic reactions that were widely distributed in all the organisms and considered here as universal to Bacteria, Archaea, and Eukarya; whereas 132 pairs out of 3151 reactions were identified as significant, only 5 of them were found to be widely distributed in all the taxonomic divisions. However, these universal reactions are not widely distributed along the metabolic maps, suggesting their dispensability to all metabolic processes. Finally, we found that universal reactions are also associated with ancestral domains, such as those related to phosphorus-containing groups with a phosphate group as acceptor or those related to the ribulose-phosphate binding barrel, triosephosphate isomerase, and D-ribose-5-phosphate isomerase (RpiA) lid domain, among others. Therefore, we consider that this analysis provides clues about the functional constraints associated with the repertoire of enzymatic functions per organism.
Collapse
|
35
|
Colson P, Levasseur A, La Scola B, Sharma V, Nasir A, Pontarotti P, Caetano-Anollés G, Raoult D. Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes. Front Microbiol 2018; 9:2668. [PMID: 30538677 PMCID: PMC6277510 DOI: 10.3389/fmicb.2018.02668] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/18/2018] [Indexed: 12/20/2022] Open
Abstract
Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named ‘TRUC’ (for “Things Resisting Uncompleted Classifications”) alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of ‘universal’ FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.
Collapse
Affiliation(s)
- Philippe Colson
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Anthony Levasseur
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Bernard La Scola
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Vikas Sharma
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France.,Centre National de la Recherche Scientifique, Marseille, France
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, IL, United States.,Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Pierre Pontarotti
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France.,Centre National de la Recherche Scientifique, Marseille, France
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, IL, United States
| | - Didier Raoult
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| |
Collapse
|
36
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
37
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
38
|
Caetano-Anollés D, Caetano-Anollés K, Caetano-Anollés G. Evolution of macromolecular structure: a 'double tale' of biological accretion and diversification. Sci Prog 2018; 101:360-383. [PMID: 30296968 PMCID: PMC10365222 DOI: 10.3184/003685018x15379391431599] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The evolution of structure in biology is driven by accretion and diversification. Accretion brings together disparate parts to form bigger wholes. Diversification provides opportunities for growth and innovation. Here, we review patterns and processes that are responsible for a 'double tale' of accretion and diversification at various levels of complexity, from proteins and nucleic acids to high-rise building structures in cities. Parts are at first weakly linked and associate variously. As they diversify, they compete with each other and are selected for performance. The emerging interactions constrain their structure and associations. This causes parts to self-organise into modules with tight linkage. In a second phase, variants of the modules evolve and become new parts for a new generative cycle of higher-level organisation. Evolutionary genomics and network biology support the 'double tale' of structural module creation and validate an evolutionary principle of maximum abundance that drives the gain and loss of modules.
Collapse
Affiliation(s)
- Derek Caetano-Anollés
- Department of Evolutionary Genetics of the Max-Planck Institute for Evolutionary Biology, Plön, Germany. Developmental Biology from the University of Illinois, Urbana-Champaign
| | - Kelsey Caetano-Anollés
- Division of Biomedical Informatics of Seoul National University College of Medicine, Republic of Korea. Animal Sciences from the University of Illinois, Urbana-Champaign
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences and Affiliate of the C.R. Woese Institute for Genomic Biology at the University of Illinois, Urbana-Champaign. University of La Plata in Argentina
| |
Collapse
|
39
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
40
|
Medvedev KE, Kinch LN, Grishin NV. Functional and evolutionary analysis of viral proteins containing a Rossmann-like fold. Protein Sci 2018; 27:1450-1463. [PMID: 29722076 PMCID: PMC6153405 DOI: 10.1002/pro.3438] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 04/30/2018] [Accepted: 05/01/2018] [Indexed: 11/17/2022]
Abstract
Viruses are the most abundant life form and infect practically all organisms. Consequently, these obligate parasites are a major cause of human suffering and economic loss. Rossmann-like fold is the most populated fold among α/β-folds in the Protein Data Bank and proteins containing Rossmann-like fold constitute 22% of all known proteins 3D structures. Thus, analysis of viral proteins containing Rossmann-like domains could provide an understanding of viral biology and evolution as well as could propose possible targets for antiviral therapy. We provide functional and evolutionary analysis of viral proteins containing a Rossmann-like fold found in the evolutionary classification of protein domains (ECOD) database developed in our lab. We identified 81 protein families of bacterial, archeal, and eukaryotic viruses in light of their evolution-based ECOD classification and Pfam taxonomy. We defined their functional significance using enzymatic EC number assignments as well as domain-level family annotations.
Collapse
Affiliation(s)
- Kirill E. Medvedev
- Departments of Biophysics and BiochemistryUniversity of Texas Southwestern Medical CenterDallasTexas
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical CenterDallasTexas
| | - Nick V. Grishin
- Departments of Biophysics and BiochemistryUniversity of Texas Southwestern Medical CenterDallasTexas
- Howard Hughes Medical Institute, University of Texas Southwestern Medical CenterDallasTexas
| |
Collapse
|
41
|
Increase in soluble protein oligomers triggers the innate immune system promoting inflammation and vascular dysfunction in the pathogenesis of sepsis. Clin Sci (Lond) 2018; 132:1433-1438. [PMID: 30021912 DOI: 10.1042/cs20180368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 06/25/2018] [Accepted: 06/26/2018] [Indexed: 12/18/2022]
Abstract
Sepsis is a profoundly morbid and life-threatening condition, and an increasingly alarming burden on modern healthcare economies. Patients with septic shock exhibit persistent hypotension despite adequate volume resuscitation requiring pharmacological vasoconstrictors, but the molecular mechanisms of this phenomenon remain unclear. The accumulation of misfolded proteins is linked to numerous diseases, and it has been observed that soluble oligomeric protein intermediates are the primary cytotoxic species in these conditions. Oligomeric protein assemblies have been shown to bind and activate a variety of pattern recognition receptors (PRRs) including formyl peptide receptor (FPR). While inhibition of endoplasmic reticulum (ER) stress and stabilization of protein homeostasis have been promising lines of inquiry regarding sepsis therapy, little attention has been given to the potential effects that the accumulation of misfolded proteins may have in driving sepsis pathogenesis. Here we propose that in sepsis, there is an accumulation of toxic misfolded proteins in the form of soluble protein oligomers (SPOs) that contribute to the inflammation and vascular dysfunction observed in sepsis via the activation of one or more PRRs including FPR. Our laboratory has shown increased levels of SPOs in the heart and intrarenal arteries of septic mice. We have also observed that exposure of resistance arteries and vascular smooth muscle cells to SPOs is associated with increased mitogen-activated protein kinase (MAPK) signaling including phosphorylated extracellular signal-regulated kinase (p-ERK) and p-P38 MAPK pathways, and that this response is abolished with the knockout of FPR. This hypothesis has promising clinical implications as it proposes a novel mechanism that can be exploited as a therapeutic target in sepsis.
Collapse
|
42
|
Staley JT, Caetano-Anollés G. Archaea-First and the Co-Evolutionary Diversification of Domains of Life. Bioessays 2018; 40:e1800036. [PMID: 29944192 DOI: 10.1002/bies.201800036] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/12/2018] [Indexed: 12/13/2022]
Abstract
The origins and evolution of the Archaea, Bacteria, and Eukarya remain controversial. Phylogenomic-wide studies of molecular features that are evolutionarily conserved, such as protein structural domains, suggest Archaea is the first domain of life to diversify from a stem line of descent. This line embodies the last universal common ancestor of cellular life. Here, we propose that ancestors of Euryarchaeota co-evolved with those of Bacteria prior to the diversification of Eukarya. This co-evolutionary scenario is supported by comparative genomic and phylogenomic analyses of the distributions of fold families of domains in the proteomes of free-living organisms, which show horizontal gene recruitments and informational process homologies. It also benefits from the molecular study of cell physiologies responsible for membrane phospholipids, methanogenesis, methane oxidation, cell division, gas vesicles, and the cell wall. Our theory however challenges popular cell fusion and two-domain of life scenarios derived from sequence analysis, demanding phylogenetic reconciliation. Also see the video abstract here: https://youtu.be/9yVWn_Q9faY.
Collapse
Affiliation(s)
- James T Staley
- Department of Microbiology and Astrobiology Program, University of Washington, Seattle, WA, 98195, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, C. R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
43
|
Kaiser F, Bittrich S, Salentin S, Leberecht C, Haupt VJ, Krautwurst S, Schroeder M, Labudde D. Backbone Brackets and Arginine Tweezers delineate Class I and Class II aminoacyl tRNA synthetases. PLoS Comput Biol 2018; 14:e1006101. [PMID: 29659563 PMCID: PMC5919687 DOI: 10.1371/journal.pcbi.1006101] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 04/26/2018] [Accepted: 03/20/2018] [Indexed: 12/22/2022] Open
Abstract
The origin of the machinery that realizes protein biosynthesis in all organisms is still unclear. One key component of this machinery are aminoacyl tRNA synthetases (aaRS), which ligate tRNAs to amino acids while consuming ATP. Sequence analyses revealed that these enzymes can be divided into two complementary classes. Both classes differ significantly on a sequence and structural level, feature different reaction mechanisms, and occur in diverse oligomerization states. The one unifying aspect of both classes is their function of binding ATP. We identified Backbone Brackets and Arginine Tweezers as most compact ATP binding motifs characteristic for each Class. Geometric analysis shows a structural rearrangement of the Backbone Brackets upon ATP binding, indicating a general mechanism of all Class I structures. Regarding the origin of aaRS, the Rodin-Ohno hypothesis states that the peculiar nature of the two aaRS classes is the result of their primordial forms, called Protozymes, being encoded on opposite strands of the same gene. Backbone Brackets and Arginine Tweezers were traced back to the proposed Protozymes and their more efficient successors, the Urzymes. Both structural motifs can be observed as pairs of residues in contemporary structures and it seems that the time of their addition, indicated by their placement in the ancient aaRS, coincides with the evolutionary trace of Proto- and Urzymes. Aminoacyl tRNA synthetases (aaRS) are primordial enzymes essential for interpretation and transfer of genetic information. Understanding the origin of the peculiarities observed with aaRS can explain what constituted the earliest life forms and how the genetic code was established. The increasing amount of experimentally determined three-dimensional structures of aaRS opens up new avenues for high-throughput analyses of molecular mechanisms. In this study, we present an exhaustive structural analysis of ATP binding motifs. We unveil an oppositional implementation of enzyme substrate binding in each aaRS Class. While Class I binds via interactions mediated by backbone hydrogen bonds, Class II uses a pair of arginine residues to establish salt bridges to its ATP ligand. We show how nature realized the binding of the same ligand species with completely different mechanisms. In addition, we demonstrate that sequence or even structure analysis for conserved residues may miss important functional aspects which can only be revealed by ligand interaction studies. Additionally, the placement of those key residues in the structure supports a popular hypothesis, which states that prototypic aaRS were once coded on complementary strands of the same gene.
Collapse
Affiliation(s)
- Florian Kaiser
- University of Applied Sciences Mittweida, Mittweida, Germany
- Biotechnology Center (BIOTEC), TU Dresden, Dresden, Germany
- * E-mail:
| | - Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, Germany
- Biotechnology Center (BIOTEC), TU Dresden, Dresden, Germany
| | | | - Christoph Leberecht
- University of Applied Sciences Mittweida, Mittweida, Germany
- Biotechnology Center (BIOTEC), TU Dresden, Dresden, Germany
| | | | | | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, Germany
| |
Collapse
|
44
|
Nardo AE, Añón MC, Parisi G. Large-scale mapping of bioactive peptides in structural and sequence space. PLoS One 2018; 13:e0191063. [PMID: 29351315 PMCID: PMC5774755 DOI: 10.1371/journal.pone.0191063] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 12/27/2017] [Indexed: 12/11/2022] Open
Abstract
Health-enhancing potential bioactive peptide (BP) has driven an interest in food proteins as well as in the development of predictive methods. Research in this area has been especially active to use them as components in functional foods. Apparently, BPs do not have a given biological function in the containing proteins and they do not evolve under independent evolutionary constraints. In this work we performed a large-scale mapping of BPs in sequence and structural space. Using well curated BP deposited in BIOPEP database, we searched for exact matches in non-redundant sequences databases. Proteins containing BPs, were used in fold-recognition methods to predict the corresponding folds and BPs occurrences were mapped. We found that fold distribution of BP occurrences possibly reflects sequence relative abundance in databases. However, we also found that proteins with 5 or more than 5 BP in their sequences correspond to well populated protein folds, called superfolds. Also, we found that in well populated superfamilies, BPs tend to adopt similar locations in the protein fold, suggesting the existence of hotspots. We think that our results could contribute to the development of new bioinformatics pipeline to improve BP detection.
Collapse
Affiliation(s)
- Agustina E. Nardo
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
- Centro de Investigación y Desarrollo en Criotecnología de Alimentos, Facultad de Ciencia Exactas, Universidad Nacional de la Plata - Comisión de Investigaciones Científicas - CONICET, La Plata, Argentina
| | - M. Cristina Añón
- Centro de Investigación y Desarrollo en Criotecnología de Alimentos, Facultad de Ciencia Exactas, Universidad Nacional de la Plata - Comisión de Investigaciones Científicas - CONICET, La Plata, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
| |
Collapse
|
45
|
Wang ZC, Li YK, He SG, Bierbaum VM. Reactivity of amino acid anions with nitrogen and oxygen atoms. Phys Chem Chem Phys 2018; 20:4990-4996. [DOI: 10.1039/c7cp07886b] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Gas-phase reaction of deprotonated tyrosine with a ground state O atom generates five ionic products.
Collapse
Affiliation(s)
- Zhe-Chen Wang
- Department of Chemistry and Biochemistry
- University of Colorado
- Boulder
- Colorado 80309
- USA
| | - Ya-Ke Li
- Institute of Chemistry
- Chinese Academy of Sciences
- Beijing
- China
- University of Chinese Academy of Sciences
| | - Sheng-Gui He
- Institute of Chemistry
- Chinese Academy of Sciences
- Beijing
- China
| | - Veronica M. Bierbaum
- Department of Chemistry and Biochemistry
- University of Colorado
- Boulder
- Colorado 80309
- USA
| |
Collapse
|
46
|
Moelling K, Broecker F, Russo G, Sunagawa S. RNase H As Gene Modifier, Driver of Evolution and Antiviral Defense. Front Microbiol 2017; 8:1745. [PMID: 28959243 PMCID: PMC5603734 DOI: 10.3389/fmicb.2017.01745] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 08/28/2017] [Indexed: 12/21/2022] Open
Abstract
Retroviral infections are 'mini-symbiotic' events supplying recipient cells with sequences for viral replication, including the reverse transcriptase (RT) and ribonuclease H (RNase H). These proteins and other viral or cellular sequences can provide novel cellular functions including immune defense mechanisms. Their high error rate renders RT-RNases H drivers of evolutionary innovation. Integrated retroviruses and the related transposable elements (TEs) have existed for at least 150 million years, constitute up to 80% of eukaryotic genomes and are also present in prokaryotes. Endogenous retroviruses regulate host genes, have provided novel genes including the syncytins that mediate maternal-fetal immune tolerance and can be experimentally rendered infectious again. The RT and the RNase H are among the most ancient and abundant protein folds. RNases H may have evolved from ribozymes, related to viroids, early in the RNA world, forming ribosomes, RNA replicases and polymerases. Basic RNA-binding peptides enhance ribozyme catalysis. RT and ribozymes or RNases H are present today in bacterial group II introns, the precedents of TEs. Thousands of unique RTs and RNases H are present in eukaryotes, bacteria, and viruses. These enzymes mediate viral and cellular replication and antiviral defense in eukaryotes and prokaryotes, splicing, R-loop resolvation, DNA repair. RNase H-like activities are also required for the activity of small regulatory RNAs. The retroviral replication components share striking similarities with the RNA-induced silencing complex (RISC), the prokaryotic CRISPR-Cas machinery, eukaryotic V(D)J recombination and interferon systems. Viruses supply antiviral defense tools to cellular organisms. TEs are the evolutionary origin of siRNA and miRNA genes that, through RISC, counteract detrimental activities of TEs and chromosomal instability. Moreover, piRNAs, implicated in transgenerational inheritance, suppress TEs in germ cells. Thus, virtually all known immune defense mechanisms against viruses, phages, TEs, and extracellular pathogens require RNase H-like enzymes. Analogous to the prokaryotic CRISPR-Cas anti-phage defense possibly originating from TEs termed casposons, endogenized retroviruses ERVs and amplified TEs can be regarded as related forms of inheritable immunity in eukaryotes. This survey suggests that RNase H-like activities of retroviruses, TEs, and phages, have built up innate and adaptive immune systems throughout all domains of life.
Collapse
Affiliation(s)
- Karin Moelling
- Institute of Medical Microbiology, University of ZurichZurich, Switzerland
- Max Planck Institute for Molecular GeneticsBerlin, Germany
| | - Felix Broecker
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New YorkNY, United States
| | - Giancarlo Russo
- Functional Genomics Center Zurich, ETH Zurich/University of ZurichZurich, Switzerland
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology, ETH ZurichZurich, Switzerland
| |
Collapse
|
47
|
Rivera-Gómez N, Martínez-Núñez MA, Pastor N, Rodriguez-Vazquez K, Perez-Rueda E. Dissecting the protein architecture of DNA-binding transcription factors in bacteria and archaea. MICROBIOLOGY-SGM 2017; 163:1167-1178. [PMID: 28777072 DOI: 10.1099/mic.0.000504] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Gene regulation at the transcriptional level is a central process in all organisms where DNA-binding transcription factors play a fundamental role. This class of proteins binds specifically at DNA sequences, activating or repressing gene expression as a function of the cell's metabolic status, operator context and ligand-binding status, among other factors, through the DNA-binding domain (DBD). In addition, TFs may contain partner domains (PaDos), which are involved in ligand binding and protein-protein interactions. In this work, we systematically evaluated the distribution, abundance and domain organization of DNA-binding TFs in 799 non-redundant bacterial and archaeal genomes. We found that the distributions of the DBDs and their corresponding PaDos correlated with the size of the genome. We also identified specific combinations between the DBDs and their corresponding PaDos. Within each class of DBDs there are differences in the actual angle formed at the dimerization interface, responding to the presence/absence of ligands and/or crystallization conditions, setting the orientation of the resulting helices and wings facing the DNA. Our results highlight the importance of PaDos as central elements that enhance the diversity of regulatory functions in all bacterial and archaeal organisms, and our results also demonstrate the role of PaDos in sensing diverse signal compounds. The highly specific interactions between DBDs and PaDos observed in this work, together with our structural analysis highlighting the difficulty in predicting both inter-domain geometry and quaternary structure, suggest that these systems appeared once and evolved with diverse duplication events in all the analysed organisms.
Collapse
Affiliation(s)
- Nancy Rivera-Gómez
- Centro de Investigaciones en Biotecnología, Universidad Autónoma del Estado de Morelos, Cuernavaca, México
| | - Mario Alberto Martínez-Núñez
- Laboratorio de Estudios Ecogenómicos, Facultad de Ciencias, Unidad Académica de Ciencias y Tecnología de Yucatán, Universidad Nacional Autónoma de México, Mérida, Yucatán, México
| | - Nina Pastor
- Centro de Investigación en Dinámica Celular, IICBA. Universidad Autónoma del Estado de Morelos Av. Universidad 1001, Col. Chamilpa, Cuernavaca, Morelos 62209, México
| | - Katya Rodriguez-Vazquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas. Ciudad Universitaria, Universidad Nacional Autónoma de México, México, D.F, México
| | - Ernesto Perez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México.,Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mérida, Yucatán, México
| |
Collapse
|
48
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
49
|
What Froze the Genetic Code? Life (Basel) 2017; 7:life7020014. [PMID: 28379164 PMCID: PMC5492136 DOI: 10.3390/life7020014] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 03/27/2017] [Accepted: 04/03/2017] [Indexed: 11/16/2022] Open
Abstract
The frozen accident theory of the Genetic Code was a proposal by Francis Crick that attempted to explain the universal nature of the Genetic Code and the fact that it only contains information for twenty amino acids. Fifty years later, it is clear that variations to the universal Genetic Code exist in nature and that translation is not limited to twenty amino acids. However, given the astonishing diversity of life on earth, and the extended evolutionary time that has taken place since the emergence of the extant Genetic Code, the idea that the translation apparatus is for the most part immobile remains true. Here, we will offer a potential explanation to the reason why the code has remained mostly stable for over three billion years, and discuss some of the mechanisms that allow species to overcome the intrinsic functional limitations of the protein synthesis machinery.
Collapse
|
50
|
Voet ARD, Simoncini D, Tame JRH, Zhang KYJ. Evolution-Inspired Computational Design of Symmetric Proteins. Methods Mol Biol 2017; 1529:309-322. [PMID: 27914059 DOI: 10.1007/978-1-4939-6637-0_16] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Monomeric proteins with a number of identical repeats creating symmetrical structures are potentially very valuable building blocks with a variety of bionanotechnological applications. As such proteins do not occur naturally, the emerging field of computational protein design serves as an excellent tool to create them from nonsymmetrical templates. Existing pseudo-symmetrical proteins are believed to have evolved from oligomeric precursors by duplication and fusion of identical repeats. Here we describe a computational workflow to reverse-engineer this evolutionary process in order to create stable proteins consisting of identical sequence repeats.
Collapse
Affiliation(s)
- Arnout R D Voet
- Laboratory for Biomolecular Modelling and Design, KU Leuven, Celestijnenlaan 200G, Leuven, 3000, Belgium.
| | - David Simoncini
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
- MIAT, UR-875, INRA, F-31320, Castanet Tolosan, France
| | - Jeremy R H Tame
- Drug Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| | - Kam Y J Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|