1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Aziz MF, Mughal F, Caetano-Anollés G. Tracing the birth of structural domains from loops during protein evolution. Sci Rep 2023; 13:14688. [PMID: 37673948 PMCID: PMC10482863 DOI: 10.1038/s41598-023-41556-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 08/28/2023] [Indexed: 09/08/2023] Open
Abstract
The structures and functions of proteins are embedded into the loop scaffolds of structural domains. Their origin and evolution remain mysterious. Here, we use a novel graph-theoretical approach to describe how modular and non-modular loop prototypes combine to form folded structures in protein domain evolution. Phylogenomic data-driven chronologies reoriented a bipartite network of loops and domains (and its projections) into 'waterfalls' depicting an evolving 'elementary functionome' (EF). Two primordial waves of functional innovation involving founder 'p-loop' and 'winged-helix' domains were accompanied by an ongoing emergence and reuse of structural and functional novelty. Metabolic pathways expanded before translation functionalities. A dual hourglass recruitment pattern transferred scale-free properties from loop to domain components of the EF network in generative cycles of hierarchical modularity. Modeling the evolutionary emergence of the oldest P-loop and winged-helix domains with AlphFold2 uncovered rapid convergence towards folded structure, suggesting that a folding vocabulary exists in loops for protein fold repurposing and design.
Collapse
Affiliation(s)
- M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
3
|
Caetano-Anollés G. Agency in evolution of biomolecular communication. Ann N Y Acad Sci 2023; 1525:88-103. [PMID: 37219369 DOI: 10.1111/nyas.15005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Biomolecular communication demands that interactions between parts of a molecular system act as scaffolds for message transmission. It also requires an organized system of signs-a communicative agency-for creating and transmitting meaning. The emergence of agency, the capacity to act in a given context and generate end-directed behaviors, has baffled evolutionary biologists for centuries. Here, I explore its emergence with knowledge grounded in over two decades of evolutionary genomic and bioinformatic exploration. Biphasic processes of growth and diversification exist that generate hierarchy and modularity in biological systems at widely ranging time scales. Similarly, a biphasic process exists in communication that constructs a message before it can be transmitted for interpretation. Transmission dissipates matter-energy and information and involves computation. Agency emerges when molecular machinery generates hierarchical layers of vocabularies in an entangled communication network clustered around the universal Turing machine of the ribosome. Computations canalize biological systems to perform biological functions in a dissipative quest to structure long-lived occurrents. This occurs within the confines of a "triangle of persistence" that maximizes invariance with trade-offs between economy, flexibility, and robustness. Thus, learning from previous historical and circumstantial experiences unifies modules in a hierarchy that expands the agency of systems.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| |
Collapse
|
4
|
Prebiotic Synthesis of ATP: A Terrestrial Volcanism-Dependent Pathway. Life (Basel) 2023; 13:life13030731. [PMID: 36983886 PMCID: PMC10053121 DOI: 10.3390/life13030731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 02/27/2023] [Accepted: 03/06/2023] [Indexed: 03/12/2023] Open
Abstract
Adenosine triphosphate (ATP) is a multifunctional small molecule, necessary for all modern Earth life, which must be a component of the last universal common ancestor (LUCA). However, the relatively complex structure of ATP causes doubts about its accessibility on prebiotic Earth. In this paper, based on previous studies on the synthesis of ATP components, a plausible prebiotic pathway yielding this key molecule is constructed, which relies on terrestrial volcanism to provide the required materials and suitable conditions.
Collapse
|
5
|
The Legend of ATP: From Origin of Life to Precision Medicine. Metabolites 2022; 12:metabo12050461. [PMID: 35629965 PMCID: PMC9148104 DOI: 10.3390/metabo12050461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 02/05/2023] Open
Abstract
Adenosine triphosphate (ATP) may be the most important biological small molecule. Since it was discovered in 1929, ATP has been regarded as life’s energy reservoir. However, this compound means more to life. Its legend starts at the dawn of life and lasts to this day. ATP must be the basic component of ancient ribozymes and may facilitate the origin of structured proteins. In the existing organisms, ATP continues to construct ribonucleic acid (RNA) and work as a protein cofactor. ATP also functions as a biological hydrotrope, which may keep macromolecules soluble in the primitive environment and can regulate phase separation in modern cells. These functions are involved in the pathogenesis of aging-related diseases and breast cancer, providing clues to discovering anti-aging agents and precision medicine tactics for breast cancer.
Collapse
|
6
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
7
|
Freire MÁ. Short non-coded peptides interacting with cofactors facilitated the integration of early chemical networks. Biosystems 2021; 211:104547. [PMID: 34547425 DOI: 10.1016/j.biosystems.2021.104547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/28/2021] [Accepted: 09/15/2021] [Indexed: 11/02/2022]
Abstract
Independently developed iron-sulphur/thioester- and phosphate-driven chemical reactions would have set up two distinct reaction networks prior to coupling in a proto-metabolic system supporting a minimal organisation closure. Each chemical system assisted initially by simple catalysts and then by more complex cofactors would have provided the precursors of the small metabolites and monomer units along with their respective polymers through dehydrating template-independent assemblies. For example, acylation reactions mediated by activated thioester groups produced peptides, fatty acids and polyhydroxyalkanoates, while phosphorylation reactions by phosphorylating agents allowed the synthesis of polysaccharides, polyribonucleotides and polyphosphates. Here, we address how these independent chemical systems might fit together and shaped a proto-metabolic system, focusing specifically on cofactors as molecular fossils of metabolism. As a result, the proposed overview suggests that non-coded peptides capable of binding a variety of ligands, but in particular with a redox active versatility and/or group transfer potential could have facilitated the chemical connections that led to a minimal closure with a proto-metabolism. Later developments would have made it possible to establish a cellular organisation with more complex and interdependent metabolic pathways.
Collapse
Affiliation(s)
- Miguel Ángel Freire
- Instituto Multidisciplinario de Biología Vegetal (IMBIV), CONICET, Universidad Nacional de Córdoba (UNC). Facultad de Ciencias Exactas, Físicas y Naturales. Av. Vélez Sarsfield 299, CC 495, 5000, Córdoba, Argentina.
| |
Collapse
|
8
|
Caetano-Anollés G. The Compressed Vocabulary of Microbial Life. Front Microbiol 2021; 12:655990. [PMID: 34305827 PMCID: PMC8292947 DOI: 10.3389/fmicb.2021.655990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, United States
| |
Collapse
|
9
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
10
|
Sun F, Caetano-Anollés G. Menzerath-Altmann's Law of Syntax in RNA Accretion History. Life (Basel) 2021; 11:489. [PMID: 34071925 PMCID: PMC8228408 DOI: 10.3390/life11060489] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 05/25/2021] [Accepted: 05/26/2021] [Indexed: 01/13/2023] Open
Abstract
RNA evolves by adding substructural parts to growing molecules. Molecular accretion history can be dissected with phylogenetic methods that exploit structural and functional evidence. Here, we explore the statistical behaviors of lengths of double-stranded and single-stranded segments of growing tRNA, 5S rRNA, RNase P RNA, and rRNA molecules. The reconstruction of character state changes along branches of phylogenetic trees of molecules and trees of substructures revealed strong pushes towards an economy of scale. In addition, statistically significant negative correlations and strong associations between the average lengths of helical double-stranded stems and their time of origin (age) were identified with the Pearson's correlation and Spearman's rho methods. The ages of substructures were derived directly from published rooted trees of substructures. A similar negative correlation was detected in unpaired segments of rRNA but not for the other molecules studied. These results suggest a principle of diminishing returns in RNA accretion history. We show this principle follows a tendency of substructural parts to decrease their size when molecular systems enlarge that follows the Menzerath-Altmann's law of language in full generality and without interference from the details of molecular growth.
Collapse
Affiliation(s)
- Fengjie Sun
- School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA 30043, USA;
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
11
|
Diene SM, Pinault L, Armstrong N, Azza S, Keshri V, Khelaifia S, Chabrière E, Caetano-Anolles G, Rolain JM, Pontarotti P, Raoult D. Dual RNase and β-lactamase Activity of a Single Enzyme Encoded in Archaea. Life (Basel) 2020; 10:life10110280. [PMID: 33202677 PMCID: PMC7697635 DOI: 10.3390/life10110280] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 11/10/2020] [Accepted: 11/12/2020] [Indexed: 01/11/2023] Open
Abstract
β-lactam antibiotics have a well-known activity which disturbs the bacterial cell wall biosynthesis and may be cleaved by β-lactamases. However, these drugs are not active on archaea microorganisms, which are naturally resistant because of the lack of β-lactam target in their cell wall. Here, we describe that annotation of genes as β-lactamases in Archaea on the basis of homologous genes is a remnant of identification of the original activities of this group of enzymes, which in fact have multiple functions, including nuclease, ribonuclease, β-lactamase, or glyoxalase, which may specialized over time. We expressed class B β-lactamase enzyme from Methanosarcina barkeri that digest penicillin G. Moreover, while weak glyoxalase activity was detected, a significant ribonuclease activity on bacterial and synthetic RNAs was demonstrated. The β-lactamase activity was inhibited by β-lactamase inhibitor (sulbactam), but its RNAse activity was not. This gene appears to have been transferred to the Flavobacteriaceae group especially the Elizabethkingia genus, in which the expressed gene shows a more specialized activity on thienamycin, but no glyoxalase activity. The expressed class C-like β-lactamase gene, from Methanosarcina sp., also shows hydrolysis activity on nitrocefin and is more closely related to DD-peptidase enzymes. Our findings highlight the need to redefine the nomenclature of β-lactamase enzymes and the specification of multipotent enzymes in different ways in Archaea and bacteria over time.
Collapse
Affiliation(s)
- Seydina M. Diene
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
| | - Lucile Pinault
- Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection, 13005 Marseille, France; (L.P.); (N.A.); (S.A.)
| | - Nicholas Armstrong
- Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection, 13005 Marseille, France; (L.P.); (N.A.); (S.A.)
| | - Said Azza
- Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection, 13005 Marseille, France; (L.P.); (N.A.); (S.A.)
| | - Vivek Keshri
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
| | | | - Eric Chabrière
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
| | - Gustavo Caetano-Anolles
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA;
| | - Jean-Marc Rolain
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
- Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection, 13005 Marseille, France; (L.P.); (N.A.); (S.A.)
| | - Pierre Pontarotti
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
- CNRS, 13005 Marseille, France;
| | - Didier Raoult
- MEPHI, IHU-Mediterranee Infection, Aix Marseille University, 19-21 Bd Jean Moulin, 13005 Marseille, France; (S.M.D.); (V.K.); (E.C.); (J.-M.R.)
- Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection, 13005 Marseille, France; (L.P.); (N.A.); (S.A.)
- IHU-Méditerranée Infection, 13005 Marseille, France;
- Correspondence: ; Tel.: +33-4-1373-2401
| |
Collapse
|
12
|
De Tullio MC. Is ascorbic acid a key signaling molecule integrating the activities of 2-oxoglutarate-dependent dioxygenases? Shifting the paradigm. ENVIRONMENTAL AND EXPERIMENTAL BOTANY 2020; 178:104173. [DOI: 10.1016/j.envexpbot.2020.104173] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
13
|
Chu XY, Zhang HY. Cofactors as Molecular Fossils To Trace the Origin and Evolution of Proteins. Chembiochem 2020; 21:3161-3168. [PMID: 32515532 DOI: 10.1002/cbic.202000027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 06/03/2020] [Indexed: 12/16/2022]
Abstract
Due to their early origin and extreme conservation, cofactors are valuable molecular fossils for tracing the origin and evolution of proteins. First, as the order of protein folds binding with cofactors roughly coincides with protein-fold chronology, cofactors are considered to have facilitated the origin of primitive proteins by selecting them from pools of random amino acid sequences. Second, in the subsequent evolution of proteins, cofactors still played an important role. More interestingly, as metallic cofactors evolved with geochemical variations, some geochemical events left imprints in the chronology of protein architecture; this provides further evidence supporting the coevolution of biochemistry and geochemistry. In this paper, we attempt to review the molecular fossils used in tracing the origin and evolution of proteins, with a special focus on cofactors.
Collapse
Affiliation(s)
- Xin-Yi Chu
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
14
|
Emergence of light-driven protometabolism on recruitment of a photocatalytic cofactor by a self-replicator. Nat Chem 2020; 12:603-607. [PMID: 32591744 DOI: 10.1038/s41557-020-0494-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 05/28/2020] [Indexed: 12/14/2022]
Abstract
Establishing how life can emerge from inanimate matter is among the grand challenges of contemporary science. Chemical systems that capture life's essential characteristics-replication, metabolism and compartmentalization-offer a route to understanding this momentous process. The synthesis of life, whether based on canonical biomolecules or fully synthetic molecules, requires the functional integration of these three characteristics. Here we show how a system of fully synthetic self-replicating molecules, on recruiting a cofactor, acquires the ability to transform thiols in its environment into disulfide precursors from which the molecules can replicate. The binding of replicator and cofactor enhances the activity of the latter in oxidizing thiols into disulfides through photoredox catalysis and thereby accelerates replication by increasing the availability of the disulfide precursors. This positive feedback marks the emergence of light-driven protometabolism in a system that bears no resemblance to canonical biochemistry and constitutes a major step towards the highly challenging aim of creating a new and completely synthetic form of life.
Collapse
|
15
|
Mittal A, Changani AM, Taparia S, Goel D, Parihar A, Singh I. Structural disorder originates beyond narrow stoichiometric margins of amino acids in naturally occurring folded proteins. J Biomol Struct Dyn 2020; 39:2364-2375. [PMID: 32238088 DOI: 10.1080/07391102.2020.1751299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Rigorous analyses of Euclidean distances between non-peptide bonded residues in structures of several thousand naturally occurring folded proteins yielded a surprising "margin of life" for percentage occurrence of individual amino acids in naturally occurring folded proteins. On one hand, the concept of "margin of life", referring to lower than expected variances in average stoichiometric occurrences of individual amino acids in folded proteins, remains unchallenged since its discovery a decade ago. On the other hand, within this past decade there has been a strong emergence of a gradual paradigm shift in biology, from sequence-structure-function in proteins to sequence-disorder-function, fuelled by discoveries on functional implications of intrinsically disordered proteins (primary sequences that do not form stable structures). Thus the applicability of "margin of life" to peptide-bonded residues in all known natural proteins, adopting stable structures vis-à-vis intrinsically disordered needs to be explored. Therefore in this work, we analyze compositions of the complete naturally occurring primary sequence space (over 560000 sequences) after dividing it into mutually exclusive subsets of structured and intrinsically disordered proteins along with a subset without any structural information. While finding that occurrence of different peptides (up to pentapeptides) is a direct consequence of the relative occurrences of their constituting residues in folded proteins, we report that structural disorder in natural proteins originates beyond the narrow stoichiometric margins of amino acids found in structured proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aditya Mittal
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India.,Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| | | | - Sakshi Taparia
- Department of Mathematics (Bachelors program in Mathematics & Computing), Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| | - Deepanshu Goel
- Department of Biochemical Engineering and Biotechnology (Bachelors program), Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| | - Animesh Parihar
- Department of Biochemical Engineering and Biotechnology (Bachelors program), Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| | - Ishan Singh
- Department of Computer Science & Engineering (Bachelors program Computer Science), Indian Institute of Technology Delhi (IIT Delhi), New Delhi, India
| |
Collapse
|
16
|
Abstract
How proteins evolved to recognize and bind their ligands is a key mystery in protein function evolution. To explore this mystery, we study how proteins bind adenine, an ancient fragment. We characterize physicochemical patterns of protein–adenine interactions and link these to proteins’ evolutionary origins. In conflict with previous findings, we see that all of adenine’s hydrogen donors and acceptors have been used to bind proteins, and that adenine binding is likely to have emerged multiple times in evolution. To identify adenine-binding sites of shared origin, we use “themes”: short amino acid segments suggested to constitute evolutionary building blocks. We detect specific themes that are engaged in adenine binding; the detection of these in a protein’s sequence might reveal its function. Proteins’ interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein–adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein–adenine interactions in the Watson–Crick edge of adenine and shows that all of adenine’s edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments (“themes”) that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.
Collapse
|
17
|
Laffont C, Arnoux P. The ancient roots of nicotianamine: diversity, role, regulation and evolution of nicotianamine-like metallophores. Metallomics 2020; 12:1480-1493. [PMID: 33084706 DOI: 10.1039/d0mt00150c] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Nicotianamine (NA) is a metabolite synthesized by all plants, in which it is involved in the homeostasis of different micronutrients such as iron, nickel or zinc. In some plants it also serves as a precursor of phytosiderophores, which are used for extracellular iron scavenging. Previous studies have also established the presence of NA in filamentous fungi and some mosses, whereas an analogue of NA was inferred in an archaeon. More recently, opine-type metallophores with homology to NA were uncovered in bacteria, especially in human pathogens such as Staphylococcus aureus, Pseudomonas aeruginosa or Yersinia pestis, synthesizing respectively staphylopine, pseudopaline and yersinopine. Here, we review the current state of knowledge regarding the discovery, biosynthesis, function and regulation of these metallophores. We also discuss the genomic environment of the cntL gene, which is homologous to the plant NA synthase (NAS) gene, and plays a central role in the synthesis of NA-like metallophores. This reveals a large diversity of biosynthetic, export and import pathways. Using sequence similarity networks, we uncovered that these metallophores are widespread in numerous bacteria thriving in very different environments, such as those living at the host-pathogen interface, but also in the soil. We additionally established a phylogeny of the NAS/cntL gene and, as a result, we propose that this gene is an ancient gene and NA, or its derivatives, is an ancient metallophore that played a prominent role in metal acquisition or metal resistance. Indeed, our phylogenetic analysis suggests an evolutionary model where the possibility to synthesize this metallophore was present early in the appearance of life, although it was later lost by most living microorganisms, unless facing metal starvation such as at the host-pathogen interface or in some soils. According to our model, NA then re-emerged as a central metabolite for metal homeostasis in fungi, mosses and all known higher plants.
Collapse
Affiliation(s)
- Clémentine Laffont
- Aix Marseille Univ, CEA, CNRS, BIAM, Saint Paul-Lez-Durance, F-13108, France.
| | - Pascal Arnoux
- Aix Marseille Univ, CEA, CNRS, BIAM, Saint Paul-Lez-Durance, F-13108, France.
| |
Collapse
|
18
|
Pollack JD, Gerard D, Makhatadze GI, Pearl DK. Evolutionary conservation and structural localizations suggest a physical trace of metabolism’s progressive geochronological emergence. J Biomol Struct Dyn 2019; 38:3700-3719. [DOI: 10.1080/07391102.2019.1679666] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- J. Dennis Pollack
- Department of Molecular Virology, Immunology and Medical Genetics, College of Medicine, The Ohio State University, Columbus, Ohio, USA
| | - David Gerard
- Department of Mathematics and Statistics, American University, Washington, DC, USA
| | - George I. Makhatadze
- Department of Biological Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Dennis K. Pearl
- Department of Statistics, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
19
|
|
20
|
Mughal F, Caetano-Anollés G. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 2019; 14:e0224201. [PMID: 31648227 PMCID: PMC6812854 DOI: 10.1371/journal.pone.0224201] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open
Abstract
Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database, an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. The database, which was introduced in 2006, traces the deep history of the structural domains of enzymes in metabolic pathways. Here we release version 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. MANET is available at http://manet.illinois.edu.
Collapse
Affiliation(s)
- Fizza Mughal
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
21
|
Lancet D, Segrè D, Kahana A. Twenty Years of "Lipid World": A Fertile Partnership with David Deamer. Life (Basel) 2019; 9:E77. [PMID: 31547028 PMCID: PMC6958426 DOI: 10.3390/life9040077] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 09/08/2019] [Accepted: 09/10/2019] [Indexed: 12/17/2022] Open
Abstract
"The Lipid World" was published in 2001, stemming from a highly effective collaboration with David Deamer during a sabbatical year 20 years ago at the Weizmann Institute of Science in Israel. The present review paper highlights the benefits of this scientific interaction and assesses the impact of the lipid world paper on the present understanding of the possible roles of amphiphiles and their assemblies in the origin of life. The lipid world is defined as a putative stage in the progression towards life's origin, during which diverse amphiphiles or other spontaneously aggregating small molecules could have concurrently played multiple key roles, including compartment formation, the appearance of mutually catalytic networks, molecular information processing, and the rise of collective self-reproduction and compositional inheritance. This review brings back into a broader perspective some key points originally made in the lipid world paper, stressing the distinction between the widely accepted role of lipids in forming compartments and their expanded capacities as delineated above. In the light of recent advancements, we discussed the topical relevance of the lipid worldview as an alternative to broadly accepted scenarios, and the need for further experimental and computer-based validation of the feasibility and implications of the individual attributes of this point of view. Finally, we point to possible avenues for exploring transition paths from small molecule-based noncovalent structures to more complex biopolymer-containing proto-cellular systems.
Collapse
Affiliation(s)
- Doron Lancet
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610010, Israel.
| | - Daniel Segrè
- Bioinformatics Program, Department of Biology, Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
| | - Amit Kahana
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610010, Israel.
| |
Collapse
|
22
|
Caetano-Anollés G, Aziz MF, Mughal F, Gräter F, Koç I, Caetano-Anollés K, Caetano-Anollés D. Emergence of Hierarchical Modularity in Evolving Networks Uncovered by Phylogenomic Analysis. Evol Bioinform Online 2019; 15:1176934319872980. [PMID: 31523127 PMCID: PMC6728656 DOI: 10.1177/1176934319872980] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 08/08/2019] [Indexed: 01/15/2023] Open
Abstract
Networks describe how parts associate with each other to form integrated systems which often have modular and hierarchical structure. In biology, network growth involves two processes, one that unifies and the other that diversifies. Here, we propose a biphasic (bow-tie) theory of module emergence. In the first phase, parts are at first weakly linked and associate variously. As they diversify, they compete with each other and are often selected for performance. The emerging interactions constrain their structure and associations. This causes parts to self-organize into modules with tight linkage. In the second phase, variants of the modules diversify and become new parts for a new generative cycle of higher level organization. The paradigm predicts the rise of hierarchical modularity in evolving networks at different timescales and complexity levels. Remarkably, phylogenomic analyses uncover this emergence in the rewiring of metabolomic and transcriptome-informed metabolic networks, the nanosecond dynamics of proteins, and evolving networks of metabolism, elementary functionomes, and protein domain organization.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory,
Department of Crop Sciences, C.R. Woese Institute for Genomic Biology, and Illinois
Informatics Institute, University of Illinois, Urbana, IL, USA
| | - Frauke Gräter
- Heidelberg Institute for Theoretical
Studies, Heidelberg, Germany
| | - Ibrahim Koç
- Department of Molecular Biology and
Genetics, Gebze Technical University, Gebze, Turkey
| | - Kelsey Caetano-Anollés
- Division of Biomedical Informatics,
College of Medicine, Seoul National University, Seoul, Republic of Korea
| | | |
Collapse
|
23
|
Abstract
Nonribosomal peptides are assemblages, including antibiotics, of canonical amino acids and other molecules. β-lactam antibiotics act on bacterial cell walls and can be cleaved by β-lactamases. β-lactamase activity in humans has been neglected, even though eighteen enzymes have already been annotated such in human genome. Their hydrolysis activities on antibiotics have not been previously investigated. Here, we report that human cells were able to digest penicillin and this activity was inhibited by β-lactamase inhibitor, i.e. sulbactam. Penicillin degradation in human cells was microbiologically demonstrated on Pneumococcus. We expressed a MBLAC2 human β-lactamase, known as an exosome biogenesis enzyme. It cleaved penicillin and was inhibited by sulbactam. Finally, β-lactamases are widely distributed, archaic, and have wide spectrum, including digesting anticancer and β-lactams, that can be then used as nutriments. The evidence of the other MBLAC2 role as a bona fide β-lactamase allows for reassessment of β-lactams and β-lactamases role in humans.
Collapse
|
24
|
Maraldi NM. In search of a primitive signaling code. Biosystems 2019; 183:103984. [PMID: 31201829 DOI: 10.1016/j.biosystems.2019.103984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 06/11/2019] [Accepted: 06/12/2019] [Indexed: 10/26/2022]
Abstract
Cells must have preceded by simpler chemical systems (protocells) that had the capacity of a spontaneous self-assembly process and the ability to confine chemical reaction networks together with a form of information. The presence of lipid molecules in the early Earth conditions is sufficient to ensure the occurrence of spontaneous self-assembly processes, not defined by genetic information, but related to their chemical amphiphilic nature. Ribozymes are plausible molecules for early life, being the first small polynucleotides made up of random oligomers or formed by non-enzymatic template copying. Compartmentalization represents a strategy for the evolution of ribozymes; the attachment of ribozymes to surfaces, such as formed by lipid micellar aggregates may be particular relevant if the surface itself catalyzes RNA polymerization.It is conceivable that the transition from pre-biotic molecular aggregates to cellular life required the coevolution of the RNA world, capable of synthesizing specific, instead of statistical proteins, and of the Lipid world, with a transition from micellar aggregates to semipermeable vesicles. Small molecules available in the prebiotic inventory might promote RNA stability and the evolution of hydrophobic micellar aggregates into membrane-delimited vesicles. The transition from ribozymes catalyzing the assembly of statistical polypeptides to the synthesis of proteins, required the appearance of the genetic code; the transition from hydrophobic platforms favoring the stability of ribozymes and of nascent polypeptides to the selective transport of reagents through a membrane, required the appearance of the signal transduction code.A further integration between the RNA and Lipid worlds can be advanced, taking into account the emerging roles of phospholipid aggregates not only in ensuring stability to ribozymes by compartmentalization, but also in a crucial step of evolution through natural selection mechanisms, based on signal transduction pathways that convert environmental changes into biochemical responses that could vary according to the context. Here I present evidences on the presence of traces of the evolution of a signal transduction system in extant cells, which utilize a phosphoinositide signaling system located both at nucleoplasmic level as well as at the plasma membrane, based on the very same molecules but responding to different rules. The model herewith proposed is based on the following assumptions on the biomolecules of extant organisms: i) amphiphils can be converted into structured aggregates by hydrophobic forces thus giving rise to functional platforms for the interaction of other biomolecules and to their compartmentalization; ii) fundamental biochemical pathways, including protein synthesis, can be sustained by natural ribozymes of ancient origin; iii) ribozymes and nucleotide-derived coenzymes could have existed long before protein enzymes emerged; iv) signaling molecules, both derived from phospholipids and from RNAs could have guided the evolution of complex metabolic processes before the emergence of proteins.
Collapse
Affiliation(s)
- Nadir M Maraldi
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Italy
| |
Collapse
|
25
|
Tian T, Chu XY, Yang Y, Zhang X, Liu YM, Gao J, Ma BG, Zhang HY. Phosphates as Energy Sources to Expand Metabolic Networks. Life (Basel) 2019; 9:life9020043. [PMID: 31121973 PMCID: PMC6617280 DOI: 10.3390/life9020043] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 05/18/2019] [Accepted: 05/21/2019] [Indexed: 11/29/2022] Open
Abstract
Phosphates are essential for modern metabolisms. A recent study reported a phosphate-free metabolic network and suggested that thioesters, rather than phosphates, could alleviate thermodynamic bottlenecks of network expansion. As a result, it was considered that a phosphorus-independent metabolism could exist before the phosphate-based genetic coding system. To explore the origin of phosphorus-dependent metabolism, the present study constructs a protometabolic network that contains phosphates prebiotically available using computational systems biology approaches. It is found that some primitive phosphorylated intermediates could greatly alleviate thermodynamic bottlenecks of network expansion. Moreover, the phosphorus-dependent metabolic network exhibits several ancient features. Taken together, it is concluded that phosphates played a role as important as that of thioesters during the origin and evolution of metabolism. Both phosphorus and sulfur are speculated to be critical to the origin of life.
Collapse
Affiliation(s)
- Tian Tian
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xin-Yi Chu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yi Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xuan Zhang
- Beijing National Center for Molecular Sciences, Institute of Theoretical and Computational Chemistry, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.
| | - Ye-Mao Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Jun Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
26
|
Colson P, Levasseur A, La Scola B, Sharma V, Nasir A, Pontarotti P, Caetano-Anollés G, Raoult D. Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes. Front Microbiol 2018; 9:2668. [PMID: 30538677 PMCID: PMC6277510 DOI: 10.3389/fmicb.2018.02668] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/18/2018] [Indexed: 12/20/2022] Open
Abstract
Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named ‘TRUC’ (for “Things Resisting Uncompleted Classifications”) alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of ‘universal’ FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.
Collapse
Affiliation(s)
- Philippe Colson
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Anthony Levasseur
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Bernard La Scola
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Vikas Sharma
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France.,Centre National de la Recherche Scientifique, Marseille, France
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, IL, United States.,Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Pierre Pontarotti
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France.,Centre National de la Recherche Scientifique, Marseille, France
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, IL, United States
| | - Didier Raoult
- Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), Assistance Publique - Hôpitaux de Marseille (AP-HM); Microbes, Evolution, Phylogeny and Infection (MEΦI); Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| |
Collapse
|
27
|
Abstract
Previously we reported [A. Wynveen et al., Phys. Rev. E 89, 022725 (2014)PLEEE81539-375510.1103/PhysRevE.89.022725] that requiring that the systems regarded as lifelike be out of chemical equilibrium in a model of abstracted polymers undergoing ligation and scission first introduced by Kauffman [S. A. Kauffman, The Origins of Order (Oxford University Press, New York, 1993), Chap. 7] implied that lifelike systems were most probable when the reaction network was sparse. The model was entirely statistical and took no account of the bond energies or other energetic constraints. Here we report results of an extension of the model to include effects of a finite bonding energy in the model. We studied two conditions: (1) A food set is continuously replenished and the total polymer population is constrained but the system is otherwise isolated and (2) in addition to the constraints in (1) the system is in contact with a finite-temperature heat bath. In each case, detailed balance in the dynamics is guaranteed during the computations by continuous recomputation of a temperature [in case (1)] and of the chemical potential (in both cases) toward which the system is driven by the dynamics. In the isolated case, the probability of reaching a metastable nonequilibrium state in this model depends significantly on the composition of the food set, and the nonequilibrium states satisfying lifelike condition turn out to be at energies and particle numbers consistent with an equilibrium state at high negative temperature. As a function of the sparseness of the reaction network, the lifelike probability is nonmonotonic, as in our previous model, but the maximum probability occurs when the network is less sparse. In the case of contact with a thermal bath at a positive ambient temperature, we identify two types of metastable nonequilibrium states, termed locally and thermally alive, and locally dead and thermally alive, and evaluate their likelihood of appearance, finding maxima at an optimal temperature and an optimal degree of sparseness in the network. We use a Euclidean metric in the space of polymer populations to distinguish these states from one another and from fully equilibrated states. The metric can be used to characterize the degree and type of chemical equilibrium in observed systems, as we illustrate for the proteome of the ribosome.
Collapse
Affiliation(s)
- B F Intoy
- School of Physics and Astronomy, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - J W Halley
- School of Physics and Astronomy, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
28
|
Kubyshkin V, Budisa N. Synthetic alienation of microbial organisms by using genetic code engineering: Why and how? Biotechnol J 2017; 12. [PMID: 28671771 DOI: 10.1002/biot.201600097] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 05/19/2017] [Accepted: 05/31/2017] [Indexed: 12/31/2022]
Abstract
The main goal of synthetic biology (SB) is the creation of biodiversity applicable for biotechnological needs, while xenobiology (XB) aims to expand the framework of natural chemistries with the non-natural building blocks in living cells to accomplish artificial biodiversity. Protein and proteome engineering, which overcome limitation of the canonical amino acid repertoire of 20 (+2) prescribed by the genetic code by using non-canonic amino acids (ncAAs), is one of the main focuses of XB research. Ideally, estranging the genetic code from its current form via systematic introduction of ncAAs should enable the development of bio-containment mechanisms in synthetic cells potentially endowing them with a "genetic firewall" i.e. orthogonality which prevents genetic information transfer to natural systems. Despite rapid progress over the past two decades, it is not yet possible to completely alienate an organism that would use and maintain different genetic code associations permanently. In order to engineer robust bio-contained life forms, the chemical logic behind the amino acid repertoire establishment should be considered. Starting from recent proposal of Hartman and Smith about the genetic code establishment in the RNA world, here the authors mapped possible biotechnological invasion points for engineering of bio-contained synthetic cells equipped with non-canonical functionalities.
Collapse
Affiliation(s)
- Vladimir Kubyshkin
- Biocatalysis group, Institute of Chemistry, Technical University of Berlin, Germany
| | - Nediljko Budisa
- Biocatalysis group, Institute of Chemistry, Technical University of Berlin, Germany
| |
Collapse
|
29
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
30
|
Jakubowski H. Homocysteine Editing, Thioester Chemistry, Coenzyme A, and the Origin of Coded Peptide Synthesis †. Life (Basel) 2017; 7:life7010006. [PMID: 28208756 PMCID: PMC5370406 DOI: 10.3390/life7010006] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 02/03/2017] [Indexed: 12/22/2022] Open
Abstract
Aminoacyl-tRNA synthetases (AARSs) have evolved “quality control” mechanisms which prevent tRNA aminoacylation with non-protein amino acids, such as homocysteine, homoserine, and ornithine, and thus their access to the Genetic Code. Of the ten AARSs that possess editing function, five edit homocysteine: Class I MetRS, ValRS, IleRS, LeuRS, and Class II LysRS. Studies of their editing function reveal that catalytic modules of these AARSs have a thiol-binding site that confers the ability to catalyze the aminoacylation of coenzyme A, pantetheine, and other thiols. Other AARSs also catalyze aminoacyl-thioester synthesis. Amino acid selectivity of AARSs in the aminoacyl thioesters formation reaction is relaxed, characteristic of primitive amino acid activation systems that may have originated in the Thioester World. With homocysteine and cysteine as thiol substrates, AARSs support peptide bond synthesis. Evolutionary origin of these activities is revealed by genomic comparisons, which show that AARSs are structurally related to proteins involved in coenzyme A/sulfur metabolism and non-coded peptide bond synthesis. These findings suggest that the extant AARSs descended from ancestral forms that were involved in non-coded Thioester-dependent peptide synthesis, functionally similar to the present-day non-ribosomal peptide synthetases.
Collapse
Affiliation(s)
- Hieronim Jakubowski
- Department of Microbiology, Biochemistry and Molecular Genetics, New Jersey Medical School, Rutgers University, Newark, NJ 07103, USA.
- Department of Biochemistry and Biotechnology, University of Life Sciences, Poznan 60-632, Poland.
| |
Collapse
|
31
|
Caetano-Anollés D, Caetano-Anollés G. Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks. Life (Basel) 2016; 6:life6040043. [PMID: 27918435 PMCID: PMC5198078 DOI: 10.3390/life6040043] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 11/21/2016] [Accepted: 11/29/2016] [Indexed: 01/10/2023] Open
Abstract
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates.
Collapse
Affiliation(s)
- Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, 24306 Plön, Germany.
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
32
|
Sequence analysis of the Hsp70 family in moss and evaluation of their functions in abiotic stress responses. Sci Rep 2016; 6:33650. [PMID: 27644410 PMCID: PMC5028893 DOI: 10.1038/srep33650] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 08/31/2016] [Indexed: 11/30/2022] Open
Abstract
The 70-kD heat shock proteins (Hsp70s) are highly conserved molecular chaperones that play essential roles in cellular processes including abiotic stress responses. Physcomitrella patens serves as a representative of the first terrestrial plants and can recover from serious dehydration. To assess the possible relationship between P. patens Hsp70s and dehydration tolerance, we analyzed the P. patens genome and found at least 21 genes encoding Hsp70s. Gene structure and motif composition were relatively conserved in each subfamily. The intron-exon structure of PpcpHsp70-2 was different from that of other PpcpHsp70s; this gene exhibits several forms of intron retention, indicating that introns may play important roles in regulating gene expression. We observed expansion of Hsp70s in P. patens, which may reflect adaptations related to development and dehydration tolerance, and results mainly from tandem and segmental duplications. Expression profiles of rice, Arabidopsis and P. patens Hsp70 genes revealed that more than half of the Hsp70 genes were responsive to ABA, salt and drought. The presence of overrepresented cis-elements (DOFCOREZM and GCCCORE) among stress-responsive Hsp70s suggests that they share a common regulatory pathway. Moss plants overexpressing PpcpHsp70-2 showed salt and dehydration tolerance, further supporting a role in adaptation to land. This work highlights directions for future functional analyses of Hsp70s.
Collapse
|
33
|
Aziz MF, Caetano-Anollés K, Caetano-Anollés G. The early history and emergence of molecular functions and modular scale-free network behavior. Sci Rep 2016; 6:25058. [PMID: 27121452 PMCID: PMC4848518 DOI: 10.1038/srep25058] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 04/08/2016] [Indexed: 12/17/2022] Open
Abstract
The formation of protein structural domains requires that biochemical functions, defined by conserved amino acid sequence motifs, be embedded into a structural scaffold. Here we trace domain history onto a bipartite network of elementary functional loop sequences and domain structures defined at the fold superfamily level of SCOP classification. The resulting 'elementary functionome' network and its loop motif and structural domain graph projections create evolutionary 'waterfalls' describing the emergence of primordial functions. Waterfalls reveal how ancient loops are shared by domain structures in two initial waves of functional innovation that involve founder 'p-loop' and 'winged helix' domain structures. They also uncover a dynamics of modular motif embedding in domain structures that is ongoing, which transfers 'preferential' cooption properties of ancient loops to emerging domains. Remarkably, we find that the emergence of molecular functions induces hierarchical modularity and power law behavior in network evolution as the network of motifs and structures expand metabolic pathways and translation.
Collapse
Affiliation(s)
- M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| | - Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| |
Collapse
|
34
|
Coevolution Theory of the Genetic Code at Age Forty: Pathway to Translation and Synthetic Life. Life (Basel) 2016; 6:life6010012. [PMID: 26999216 PMCID: PMC4810243 DOI: 10.3390/life6010012] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 02/26/2016] [Accepted: 03/04/2016] [Indexed: 11/17/2022] Open
Abstract
The origins of the components of genetic coding are examined in the present study. Genetic information arose from replicator induction by metabolite in accordance with the metabolic expansion law. Messenger RNA and transfer RNA stemmed from a template for binding the aminoacyl-RNA synthetase ribozymes employed to synthesize peptide prosthetic groups on RNAs in the Peptidated RNA World. Coevolution of the genetic code with amino acid biosynthesis generated tRNA paralogs that identify a last universal common ancestor (LUCA) of extant life close to Methanopyrus, which in turn points to archaeal tRNA introns as the most primitive introns and the anticodon usage of Methanopyrus as an ancient mode of wobble. The prediction of the coevolution theory of the genetic code that the code should be a mutable code has led to the isolation of optional and mandatory synthetic life forms with altered protein alphabets.
Collapse
|
35
|
The TIM Barrel Architecture Facilitated the Early Evolution of Protein-Mediated Metabolism. J Mol Evol 2016; 82:17-26. [PMID: 26733481 PMCID: PMC4709378 DOI: 10.1007/s00239-015-9722-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 11/11/2015] [Indexed: 12/30/2022]
Abstract
The triosephosphate isomerase (TIM) barrel protein fold is a structurally repetitive architecture that is present in approximately 10 % of all enzymes. It is generally assumed that this ubiquity in modern proteomes reflects an essential historical role in early protein-mediated metabolism. Here, we provide quantitative and comparative analyses to support several hypotheses about the early importance of the TIM barrel architecture. An information theoretical analysis of protein structures supports the hypothesis that the TIM barrel architecture could arise more easily by duplication and recombination compared to other mixed α/β structures. We show that TIM barrel enzymes corresponding to the most taxonomically broad superfamilies also have the broadest range of functions, often aided by metal and nucleotide-derived cofactors that are thought to reflect an earlier stage of metabolic evolution. By comparison to other putatively ancient protein architectures, we find that the functional diversity of TIM barrel proteins cannot be explained simply by their antiquity. Instead, the breadth of TIM barrel functions can be explained, in part, by the incorporation of a broad range of cofactors, a trend that does not appear to be shared by proteins in general. These results support the hypothesis that the simple and functionally general TIM barrel architecture may have arisen early in the evolution of protein biosynthesis and provided an ideal scaffold to facilitate the metabolic transition from ribozymes, peptides, and geochemical catalysts to modern protein enzymes.
Collapse
|
36
|
Kang SK, Chen BX, Tian T, Jia XS, Chu XY, Liu R, Dong PF, Yang QY, Zhang HY. ATP selection in a random peptide library consisting of prebiotic amino acids. Biochem Biophys Res Commun 2015; 466:400-5. [PMID: 26365351 DOI: 10.1016/j.bbrc.2015.09.038] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 09/08/2015] [Indexed: 01/02/2023]
Abstract
Based upon many theoretical findings on protein evolution, we proposed a ligand-selection model for the origin of proteins, in which the most ancient proteins originated from ATP selection in a pool of random peptides. To test this ligand-selection model, we constructed a random peptide library consisting of 15 types of prebiotic amino acids and then used cDNA display to perform six rounds of in vitro selection with ATP. By means of next-generation sequencing, the most prevalent sequence was defined. Biochemical and biophysical characterization of the selected peptide showed that it was stable and foldable and had ATP-hydrolysis activity as well.
Collapse
Affiliation(s)
- Shou-Kai Kang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Bai-Xue Chen
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Tian Tian
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Xi-Shuai Jia
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Xin-Yi Chu
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Rong Liu
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Peng-Fei Dong
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Qing-Yong Yang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Hong-Yu Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China.
| |
Collapse
|
37
|
Caetano-Anollés G, Caetano-Anollés D. Computing the origin and evolution of the ribosome from its structure - Uncovering processes of macromolecular accretion benefiting synthetic biology. Comput Struct Biotechnol J 2015; 13:427-47. [PMID: 27096056 PMCID: PMC4823900 DOI: 10.1016/j.csbj.2015.07.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 07/16/2015] [Accepted: 07/19/2015] [Indexed: 12/11/2022] Open
Abstract
Accretion occurs pervasively in nature at widely different timeframes. The process also manifests in the evolution of macromolecules. Here we review recent computational and structural biology studies of evolutionary accretion that make use of the ideographic (historical, retrodictive) and nomothetic (universal, predictive) scientific frameworks. Computational studies uncover explicit timelines of accretion of structural parts in molecular repertoires and molecules. Phylogenetic trees of protein structural domains and proteomes and their molecular functions were built from a genomic census of millions of encoded proteins and associated terminal Gene Ontology terms. Trees reveal a ‘metabolic-first’ origin of proteins, the late development of translation, and a patchwork distribution of proteins in biological networks mediated by molecular recruitment. Similarly, the natural history of ancient RNA molecules inferred from trees of molecular substructures built from a census of molecular features shows patchwork-like accretion patterns. Ideographic analyses of ribosomal history uncover the early appearance of structures supporting mRNA decoding and tRNA translocation, the coevolution of ribosomal proteins and RNA, and a first evolutionary transition that brings ribosomal subunits together into a processive protein biosynthetic complex. Nomothetic structural biology studies of tertiary interactions and ancient insertions in rRNA complement these findings, once concentric layering assumptions are removed. Patterns of coaxial helical stacking reveal a frustrated dynamics of outward and inward ribosomal growth possibly mediated by structural grafting. The early rise of the ribosomal ‘turnstile’ suggests an evolutionary transition in natural biological computation. Results make explicit the need to understand processes of molecular growth and information transfer of macromolecules.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, 1101W. Peabody Drive, Urbana, IL 61801, USA; C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
38
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
39
|
Nontemplate-driven polymers: clues to a minimal form of organization closure at the early stages of living systems. Theory Biosci 2015; 134:47-64. [DOI: 10.1007/s12064-015-0209-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 04/16/2015] [Indexed: 12/27/2022]
|
40
|
The place of RNA in the origin and early evolution of the genetic machinery. Life (Basel) 2014; 4:1050-91. [PMID: 25532530 PMCID: PMC4284482 DOI: 10.3390/life4041050] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Revised: 12/02/2014] [Accepted: 12/09/2014] [Indexed: 11/17/2022] Open
Abstract
The extant genetic machinery revolves around three interrelated polymers: RNA, DNA and proteins. Two evolutionary views approach this vital connection from opposite perspectives. The RNA World theory posits that life began in a cold prebiotic broth of monomers with the de novo emergence of replicating RNA as functionally self-contained polymer and that subsequent evolution is characterized by RNA → DNA memory takeover and ribozyme → enzyme catalyst takeover. The FeS World theory posits that life began as an autotrophic metabolism in hot volcanic-hydrothermal fluids and evolved with organic products turning into ligands for transition metal catalysts thereby eliciting feedback and feed-forward effects. In this latter context it is posited that the three polymers of the genetic machinery essentially coevolved from monomers through oligomers to polymers, operating functionally first as ligands for ligand-accelerated transition metal catalysis with later addition of base stacking and base pairing, whereby the functional dichotomy between hereditary DNA with stability on geologic time scales and transient, catalytic RNA with stability on metabolic time scales existed since the dawn of the genetic machinery. Both approaches are assessed comparatively for chemical soundness.
Collapse
|
41
|
Caetano-Anollés G, Mittenthal JE, Caetano-Anollés D, Kim KM. A calibrated chronology of biochemistry reveals a stem line of descent responsible for planetary biodiversity. Front Genet 2014; 5:306. [PMID: 25309572 PMCID: PMC4161044 DOI: 10.3389/fgene.2014.00306] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 08/18/2014] [Indexed: 11/13/2022] Open
Abstract
Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies describing the evolution of biochemistry and life. These timetrees are built from a genomic census of millions of encoded proteins using models of nested accumulation of molecules in evolving proteomes. Here we show that a primordial stem line of descent, a propagating series of pluripotent cellular entities, populates the deeper branches of the timetrees. The stem line produced for the first time cellular grades ~2.9 billion years (Gy)-ago, which slowly turned into lineages of superkingdom Archaea. Prompted by the rise of planetary oxygen and aerobic metabolism, the stem line also produced bacterial and eukaryal lineages. Superkingdom-specific domain repertoires emerged ~2.1 Gy-ago delimiting fully diversified Bacteria. Repertoires specific to Eukarya and Archaea appeared 300 millions years later. Results reconcile reductive evolutionary processes leading to the early emergence of Archaea to superkingdom-specific innovations compatible with a tree of life rooted in Bacteria.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | - Jay E Mittenthal
- Department of Cell and Developmental Biology, University of Illinois Urbana, IL, USA
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | - Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology Daejeon, South Korea
| |
Collapse
|
42
|
Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM. Archaea: the first domain of diversified life. ARCHAEA (VANCOUVER, B.C.) 2014; 2014:590214. [PMID: 24987307 PMCID: PMC4060292 DOI: 10.1155/2014/590214] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 02/15/2014] [Accepted: 03/25/2014] [Indexed: 01/23/2023]
Abstract
The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic acid and protein sequences, which partially fulfill only two of the three main steps of phylogenetic analysis, formulation of realistic evolutionary models, and optimization of tree reconstruction. In the absence of character polarization, that is, the ability to identify ancestral and derived character states, any statement about the rooting of the tree of life should be considered suspect. Here we show that macromolecular structure and a new phylogenetic framework of analysis that focuses on the parts of biological systems instead of the whole provide both deep and reliable phylogenetic signal and enable us to put forth hypotheses of origin. We review over a decade of phylogenomic studies, which mine information in a genomic census of millions of encoded proteins and RNAs. We show how the use of process models of molecular accumulation that comply with Weston's generality criterion supports a consistent phylogenomic scenario in which the origin of diversified life can be traced back to the early history of Archaea.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Kaiyue Zhou
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jay E. Mittenthal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Feng-Jie Sun
- School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA 30043, USA
| | - Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Republic of Korea
| |
Collapse
|
43
|
Egel R. Origins and emergent evolution of life: the colloid microsphere hypothesis revisited. ORIGINS LIFE EVOL B 2014; 44:87-110. [PMID: 25208738 DOI: 10.1007/s11084-014-9363-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 08/14/2014] [Indexed: 11/28/2022]
Abstract
Self-replicating molecules, in particular RNA, have long been assumed as key to origins of life on Earth. This notion, however, is not very secure since the reduction of life's complexity to self-replication alone relies on thermodynamically untenable assumptions. Alternative, earlier hypotheses about peptide-dominated colloid self-assembly should be revived. Such macromolecular conglomerates presumably existed in a dynamic equilibrium between confluent growth in sessile films and microspheres detached in turbulent suspension. The first organic syntheses may have been driven by mineral-assisted photoactivation at terrestrial geothermal fields, allowing photo-dependent heterotrophic origins of life. Inherently endowed with rudimentary catalyst activities, mineral-associated organic microstructures can have evolved adaptively toward cooperative 'protolife' communities, in which 'protoplasmic continuity' was maintained throughout a graded series of 'proto-biofilms', 'protoorganisms' and 'protocells' toward modern life. The proneness of organic microspheres to merge back into the bulk of sessile films by spontaneous fusion can have made large populations promiscuous from the beginning, which was important for the speed of collective evolution early on. In this protein-centered scenario, the emergent coevolution of uncoded peptides, metabolic cofactors and oligoribonucleotides was primarily optimized for system-supporting catalytic capabilities arising from nonribosomal peptide synthesis and nonreplicative ribonucleotide polymerization, which in turn incorporated other reactive micromolecular organics as vitamins and cofactors into composite macromolecular colloid films and microspheres. Template-dependent replication and gene-encoded protein synthesis emerged as secondary means for further optimization of overall efficieny later on. Eventually, Darwinian speciation of cell-like lineages commenced after minimal gene sets had been bundled in transmissible genomes from multigenomic protoorganisms.
Collapse
Affiliation(s)
- Richard Egel
- Department of Biology, University of Copenhagen Biocenter, Ole Maaløes Vej 5, DK-2200, Copenhagen, Denmark,
| |
Collapse
|
44
|
Abstract
During the course of evolution, genomes acquire novel genetic elements as sources of functional and phenotypic diversity, including new genes that originated in recent evolution. In the past few years, substantial progress has been made in understanding the evolution and phenotypic effects of new genes. In particular, an emerging picture is that new genes, despite being present in the genomes of only a subset of species, can rapidly evolve indispensable roles in fundamental biological processes, including development, reproduction, brain function and behaviour. The molecular underpinnings of how new genes can develop these roles are starting to be characterized. These recent discoveries yield fresh insights into our broad understanding of biological diversity at refined resolution.
Collapse
|
45
|
Comparative analysis of barophily-related amino acid content in protein domains of Pyrococcus abyssi and Pyrococcus furiosus. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2013; 2013:680436. [PMID: 24187517 PMCID: PMC3804272 DOI: 10.1155/2013/680436] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2013] [Revised: 08/21/2013] [Accepted: 08/23/2013] [Indexed: 11/17/2022]
Abstract
Amino acid substitution patterns between the nonbarophilic Pyrococcus furiosus and its barophilic relative P. abyssi confirm that hydrostatic pressure asymmetry indices reflect the extent to which amino acids are preferred by barophilic archaeal organisms. Substitution patterns in entire protein sequences, shared protein domains defined at fold superfamily level, domains in homologous sequence pairs, and domains of very ancient and very recent origin now provide further clues about the environment that led to the genetic code and diversified life. The pyrococcal proteomes are very similar and share a very early ancestor. Relative amino acid abundance analyses showed that biases in the use of amino acids are due to their shared fold superfamilies. Within these repertoires, only two of the five amino acids that are preferentially barophilic, aspartic acid and arginine, displayed this preference significantly and consistently across structure and in domains appearing in the ancestor. The more primordial asparagine, lysine and threonine displayed a consistent preference for nonbarophily across structure and in the ancestor. Since barophilic preferences are already evident in ancient domains that are at least ~3 billion year old, we conclude that barophily is a very ancient trait that unfolded concurrently with genetic idiosyncrasies in convergence towards a universal code.
Collapse
|
46
|
Caetano-Anollés G, Wang M, Caetano-Anollés D. Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 2013; 8:e72225. [PMID: 23991065 PMCID: PMC3749098 DOI: 10.1371/journal.pone.0072225] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 07/07/2013] [Indexed: 11/18/2022] Open
Abstract
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the 'operational' RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail:
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
47
|
Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol 2013; 9:e1003009. [PMID: 23555236 PMCID: PMC3610613 DOI: 10.1371/journal.pcbi.1003009] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/13/2013] [Indexed: 12/22/2022] Open
Abstract
The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural make up of proteins. Here we use phylogenomic methods and a census of CATH structures in hundreds of genomes to study the origin and diversification of protein architectures (A) and their associated topologies (T) and superfamilies (H). Phylogenies that describe the evolution of domain structures and proteomes were reconstructed from the structural census and used to generate timelines of domain discovery. Phylogenies of CATH domains at T and H levels of structural abstraction and associated chronologies revealed patterns of reductive evolution, the early rise of Archaea, three epochs in the evolution of the protein world, and patterns of structural sharing between superkingdoms. Phylogenies of proteomes confirmed the early appearance of Archaea. While these findings are in agreement with previous phylogenomic studies based on the SCOP classification, phylogenies unveiled sharing patterns between Archaea and Eukarya that are recent and can explain the canonical bacterial rooting typically recovered from sequence analysis. Phylogenies of CATH domains at A level uncovered general patterns of architectural origin and diversification. The tree of A structures showed that ancient structural designs such as the 3-layer (αβα) sandwich (3.40) or the orthogonal bundle (1.10) are comparatively simpler in their makeup and are involved in basic cellular functions. In contrast, modern structural designs such as prisms, propellers, 2-solenoid, super-roll, clam, trefoil and box are not widely distributed and were probably adopted to perform specialized functions. Our timelines therefore uncover a universal tendency towards protein structural complexity that is remarkable.
Collapse
Affiliation(s)
- Syed Abbas Bukhari
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
48
|
Caetano-Anollés K, Caetano-Anollés G. Structural phylogenomics reveals gradual evolutionary replacement of abiotic chemistries by protein enzymes in purine metabolism. PLoS One 2013; 8:e59300. [PMID: 23516625 PMCID: PMC3596326 DOI: 10.1371/journal.pone.0059300] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 02/13/2013] [Indexed: 11/30/2022] Open
Abstract
The origin of metabolism has been linked to abiotic chemistries that existed in our planet at the beginning of life. While plausible chemical pathways have been proposed, including the synthesis of nucleobases, ribose and ribonucleotides, the cooption of these reactions by modern enzymes remains shrouded in mystery. Here we study the emergence of purine metabolism. The ages of protein domains derived from a census of fold family structure in hundreds of genomes were mapped onto enzymes in metabolic diagrams. We find that the origin of the nucleotide interconversion pathway benefited most parsimoniously from the prebiotic formation of adenine nucleosides. In turn, pathways of nucleotide biosynthesis, catabolism and salvage originated ∼300 million years later by concerted enzymatic recruitments and gradual replacement of abiotic chemistries. Remarkably, this process led to the emergence of the fully enzymatic biosynthetic pathway ∼3 billion years ago, concurrently with the appearance of a functional ribosome. The simultaneous appearance of purine biosynthesis and the ribosome probably fulfilled the expanding matter-energy and processing needs of genomic information.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Chicago School of Professional Psychology, Chicago, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
49
|
Debès C, Wang M, Caetano-Anollés G, Gräter F. Evolutionary optimization of protein folding. PLoS Comput Biol 2013; 9:e1002861. [PMID: 23341762 PMCID: PMC3547816 DOI: 10.1371/journal.pcbi.1002861] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 11/09/2012] [Indexed: 01/07/2023] Open
Abstract
Nature has shaped the make up of proteins since their appearance, 3.8 billion years ago. However, the fundamental drivers of structural change responsible for the extraordinary diversity of proteins have yet to be elucidated. Here we explore if protein evolution affects folding speed. We estimated folding times for the present-day catalog of protein domains directly from their size-modified contact order. These values were mapped onto an evolutionary timeline of domain appearance derived from a phylogenomic analysis of protein domains in 989 fully-sequenced genomes. Our results show a clear overall increase of folding speed during evolution, with known ultra-fast downhill folders appearing rather late in the timeline. Remarkably, folding optimization depends on secondary structure. While alpha-folds showed a tendency to fold faster throughout evolution, beta-folds exhibited a trend of folding time increase during the last 1.5 billion years that began during the “big bang” of domain combinations. As a consequence, these domain structures are on average slow folders today. Our results suggest that fast and efficient folding of domains shaped the universe of protein structure. This finding supports the hypothesis that optimization of the kinetic and thermodynamic accessibility of the native fold reduces protein aggregation propensities that hamper cellular functions. Nature has come up with an enormous variety of protein three-dimensional structures, each of which is thought to be optimized for its specific function. A fundamental biological endeavor is to uncover the driving evolutionary forces for discovering and optimizing new folds. A long-standing hypothesis is that fold evolution obeys constraints to properly fold into native structure. We here test this hypothesis by analyzing trends of proteins to fold fast during evolution. Using phylogenomic and structural analyses, we observe an overall decrease in folding times between 3.8 and 1.5 billion years ago, which can be interpreted as an evolutionary optimization for rapid folding. This trend towards fast folding probably resulted in manifold advantages, including high protein accessibility for the cell and a reduction of protein aggregation during misfolding.
Collapse
Affiliation(s)
- Cédric Debès
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail: (GCA); (FG)
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- CAS-MPG Partner Institute and Key Laboratory for Computational Biology, Shanghai, China
- * E-mail: (GCA); (FG)
| |
Collapse
|
50
|
Caetano-Anollés G, Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Front Genet 2012; 3:172. [PMID: 22973296 PMCID: PMC3434437 DOI: 10.3389/fgene.2012.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/18/2012] [Indexed: 12/25/2022] Open
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, IL, USA
| | | |
Collapse
|