1
|
Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel) 2023; 13:2281. [PMID: 38137883 PMCID: PMC10744825 DOI: 10.3390/life13122281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/18/2023] [Accepted: 11/20/2023] [Indexed: 12/24/2023] Open
Abstract
Would another origin of life resemble Earth's biochemical use of amino acids? Here, we review current knowledge at three levels: (1) Could other classes of chemical structure serve as building blocks for biopolymer structure and catalysis? Amino acids now seem both readily available to, and a plausible chemical attractor for, life as we do not know it. Amino acids thus remain important and tractable targets for astrobiological research. (2) If amino acids are used, would we expect the same L-alpha-structural subclass used by life? Despite numerous ideas, it is not clear why life favors L-enantiomers. It seems clearer, however, why life on Earth uses the shortest possible (alpha-) amino acid backbone, and why each carries only one side chain. However, assertions that other backbones are physicochemically impossible have relaxed into arguments that they are disadvantageous. (3) Would we expect a similar set of side chains to those within the genetic code? Many plausible alternatives exist. Furthermore, evidence exists for both evolutionary advantage and physicochemical constraint as explanatory factors for those encoded by life. Overall, as focus shifts from amino acids as a chemical class to specific side chains used by post-LUCA biology, the probable role of physicochemical constraint diminishes relative to that of biological evolution. Exciting opportunities now present themselves for laboratory work and computing to explore how changing the amino acid alphabet alters the universe of protein folds. Near-term milestones include: (a) expanding evidence about amino acids as attractors within chemical evolution; (b) extending characterization of other backbones relative to biological proteins; and (c) merging computing and laboratory explorations of structures and functions unlocked by xeno peptides.
Collapse
|
2
|
Tagami S. Why we are made of proteins and nucleic acids: Structural biology views on extraterrestrial life. Biophys Physicobiol 2023; 20:e200026. [PMID: 38496239 PMCID: PMC10941967 DOI: 10.2142/biophysico.bppb-v20.0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/29/2023] [Indexed: 03/19/2024] Open
Abstract
Is it a miracle that life exists on the Earth, or is it a common phenomenon in the universe? If extraterrestrial organisms exist, what are they like? To answer these questions, we must understand what kinds of molecules could evolve into life, or in other words, what properties are generally required to perform biological functions and store genetic information. This review summarizes recent findings on simple ancestral proteins, outlines the basic knowledge in textbooks, and discusses the generally required properties for biological molecules from structural biology viewpoints (e.g., restriction of shapes, and types of intra- and intermolecular interactions), leading to the conclusion that proteins and nucleic acids are at least one of the simplest (and perhaps very common) forms of catalytic and genetic biopolymers in the universe. This review article is an extended version of the Japanese article, On the Origin of Life: Coevolution between RNA and Peptide, published in SEIBUTSU BUTSURI Vol. 61, p. 232-235 (2021).
Collapse
Affiliation(s)
- Shunsuke Tagami
- RIKEN Center for Biosystems Dynamics Research, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
3
|
Brown SM, Voráček V, Freeland S. What Would an Alien Amino Acid Alphabet Look Like and Why? ASTROBIOLOGY 2023; 23:536-549. [PMID: 37022727 DOI: 10.1089/ast.2022.0107] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Life on Earth builds genetically encoded proteins by using a standard alphabet of just 20 L-α-amino acids, although many others were available to life's origins and early evolution. To better understand the causes of this foundational evolutionary outcome, we extend previous analyses which have identified a highly unusual distribution of biophysical properties within the set used by life. Specifically, we use a heuristic search algorithm to identify other sets of amino acids, from a library of plausible alternatives, that emulate life's signature. We find that a subset of amino acids seems predisposed to forming such sets. We present other examples of such alphabets under various assumptions, along with analysis and reasoning about why each might be simplistic. We do so to introduce the central, open question that remains: while fundamental biophysics related to protein folding can potentially reduce a library of 1054 possible amino acid alphabets by 7 orders of magnitude, the framework of assumptions that does so leaves a further 1045 possibilities. It is therefore tempting to ask what additional assumptions can further reduce these 45 orders of magnitude? We thus conclude with a focus on library and alphabet construction as a useful target for subsequent research that may help future science speak with more confidence about what an alien amino acid alphabet would look like and why.
Collapse
Affiliation(s)
- Sean M Brown
- Department of Biological Sciences, University of Maryland, Baltimore County, Maryland, USA
| | - Václav Voráček
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Stephen Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, Maryland, USA
| |
Collapse
|
4
|
Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol 2023; 7:570-580. [PMID: 37024625 PMCID: PMC10089919 DOI: 10.1038/s41559-023-02010-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/10/2023] [Indexed: 04/08/2023]
Abstract
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Filip Buchel
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic
- Department of Biochemistry, Charles University, Prague, Czech Republic
| | - Margaux Aubel
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Dmitry Loginov
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Petr Novák
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
- Department of Protein Evolution, MPI for Developmental Biology, Tübingen, Germany.
| | - Klára Hlouchová
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic.
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
5
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
6
|
Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K. Early Selection of the Amino Acid Alphabet Was Adaptively Shaped by Biophysical Constraints of Foldability. J Am Chem Soc 2023; 145:5320-5329. [PMID: 36826345 PMCID: PMC10017022 DOI: 10.1021/jacs.2c12987] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Whereas modern proteins rely on a quasi-universal repertoire of 20 canonical amino acids (AAs), numerous lines of evidence suggest that ancient proteins relied on a limited alphabet of 10 "early" AAs and that the 10 "late" AAs were products of biosynthetic pathways. However, many nonproteinogenic AAs were also prebiotically available, which begs two fundamental questions: Why do we have the current modern amino acid alphabet and would proteins be able to fold into globular structures as well if different amino acids comprised the genetic code? Here, we experimentally evaluate the solubility and secondary structure propensities of several prebiotically relevant amino acids in the context of synthetic combinatorial 25-mer peptide libraries. The most prebiotically abundant linear aliphatic and basic residues were incorporated along with or in place of other early amino acids to explore these alternative sequence spaces. The results show that foldability was likely a critical factor in the selection of the canonical alphabet. Unbranched aliphatic amino acids were purged from the proteinogenic alphabet despite their high prebiotic abundance because they generate polypeptides that are oversolubilized and have low packing efficiency. Surprisingly, we find that the inclusion of a short-chain basic amino acid also decreases polypeptides' secondary structure potential, for which we suggest a biophysical model. Our results support the view that, despite lacking basic residues, the early canonical alphabet was remarkably adaptive at supporting protein folding and explain why basic residues were only incorporated at a later stage of protein evolution.
Collapse
Affiliation(s)
- Mikhail Makarov
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12843, Czech Republic
| | - Alma C Sanchez Rocha
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12843, Czech Republic
| | - Robin Krystufek
- Department of Physical Chemistry, Faculty of Science, Charles University, Prague 12843, Czech Republic.,Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Ivan Cherepashuk
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12843, Czech Republic
| | - Volha Dzmitruk
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Vestec 25250, Czech Republic
| | - Tatsiana Charnavets
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Vestec 25250, Czech Republic
| | - Anneliese M Faustino
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Michal Lebl
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa 2520882, Japan
| | - Stephen D Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States.,T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12843, Czech Republic.,Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|