1
|
Pereira de Araújo AF. Sequence-dependent and -independent information in a combined random energy model for protein folding and coding. Proteins 2024; 92:679-687. [PMID: 38158239 DOI: 10.1002/prot.26658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/11/2023] [Accepted: 12/15/2023] [Indexed: 01/03/2024]
Abstract
Random energy models (REMs) provide a simple description of the energy landscapes that guide protein folding and evolution. The requirement of a large energy gap between the native structure and unfolded conformations, considered necessary for cooperative, protein-like, folding behavior, indicates that proteins differ markedly from random heteropolymers. It has been suggested, therefore, that natural selection might have acted to choose nonrandom amino acid sequences satisfying this particular condition, implying that a large fraction of possible, unselected random sequences, would not fold to any structure. From an informational perspective, however, this scenario could indicate that protein structures, regarded as messages to be transmitted through a communication channel, would not be efficiently encoded in amino acid sequences, regarded as the communication channel for this transmission, since a large fraction of possible channel states would not be used. Here, we use a combined REM for conformations and sequences, with previously estimated parameters for natural proteins, to explore an alternative possibility in which the appropriate shape of the landscape results mainly from the deviation from randomness of possible native structures instead of sequences. We observe that this situation emerges naturally if the distribution of conformational energies happens to arise from two independent contributions corresponding to sequence-dependent and -independent terms. This construction is consistent with the hypothesis of a protein burial folding code, with native structures being determined by a modest amount of sequence-dependent atomic burial information with sequence-independent constraints imposed by unspecific hydrogen bond formation. More generally, an appropriate combination of sequence-dependent and -independent information accommodates the possibility of an efficient structural encoding with the main physical requirement for folding, providing possible insight not only on the folding process but also on several aspects sequence evolution such as neutral networks, conformational coverage, and de novo gene emergence.
Collapse
Affiliation(s)
- Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica, Departamento de Biologia Celular, Universidade de Brasília, Brasília, Brazil
| |
Collapse
|
2
|
Cooley NP, Wright ES. Many purported pseudogenes in bacterial genomes are bona fide genes. BMC Genomics 2024; 25:365. [PMID: 38622536 PMCID: PMC11017572 DOI: 10.1186/s12864-024-10137-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/17/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.
Collapse
Affiliation(s)
- Nicholas P Cooley
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Zhao Z, Hu Y, Hu Y, White AP, Wang Y. Features and algorithms: facilitating investigation of secreted effectors in Gram-negative bacteria. Trends Microbiol 2023; 31:1162-1178. [PMID: 37349207 DOI: 10.1016/j.tim.2023.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
Gram-negative bacteria deliver effector proteins through type III, IV, or VI secretion systems (T3SSs, T4SSs, and T6SSs) into host cells, causing infections and diseases. In general, effector proteins for each of these distinct secretion systems lack homology and are difficult to identify. Sequence analysis has disclosed many common features, helping us to understand the evolution, function, and secretion mechanisms of the effectors. In combination with various algorithms, the known common features have facilitated accurate prediction of new effectors. Ensemblers or integrated pipelines achieve a better prediction of performance, which combines multiple computational models or modules with multidimensional features. Natural language processing (NLP) models also show the merits, which could enable discovery of novel features and, in turn, facilitate more precise effector prediction, extending our knowledge about each secretion mechanism.
Collapse
Affiliation(s)
- Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Aaron P White
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China; Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen 518060, China.
| |
Collapse
|
4
|
Wei S, Yong B, Jiang H, An Z, Wang Y, Li B, Yang C, Zhu W, Chen Q, He C. A loss-of-function mutant allele of a glycosyl hydrolase gene has been co-opted for seed weight control during soybean domestication. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2023; 65:2469-2489. [PMID: 37635359 DOI: 10.1111/jipb.13559] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 08/29/2023]
Abstract
The resultant DNA from loss-of-function mutation can be recruited in biological evolution and development. Here, we present such a rare and potential case of "to gain by loss" as a neomorphic mutation during soybean domestication for increasing seed weight. Using a population derived from a chromosome segment substitution line of Glycine max (SN14) and Glycine soja (ZYD06), a quantitative trait locus (QTL) of 100-seed weight (qHSW) was mapped on chromosome 11, corresponding to a truncated β-1, 3-glucosidase (βGlu) gene. The novel gene hsw results from a 14-bp deletion, causing a frameshift mutation and a premature stop codon in the βGlu. In contrast to HSW, the hsw completely lost βGlu activity and function but acquired a novel function to promote cell expansion, thus increasing seed weight. Overexpressing hsw instead of HSW produced large soybean seeds, and surprisingly, truncating hsw via gene editing further increased the seed size. We further found that the core 21-aa peptide of hsw and its variants acted as a promoter of seed size. Transcriptomic variation in these transgenic soybean lines substantiated the integration hsw into cell and seed size control. Moreover, the hsw allele underwent selection and expansion during soybean domestication and improvement. Our work cloned a likely domesticated QTL controlling soybean seed weight, revealed a novel genetic variation and mechanism in soybean domestication, and provided new insight into crop domestication and breeding, and plant evolution.
Collapse
Affiliation(s)
- Siming Wei
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bin Yong
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hongwei Jiang
- College of Agriculture, Northeast Agricultural University, Harbin, 150030, China
- Jilin Academy of Agricultural Sciences, Changchun, 130022, China
| | - Zhenghong An
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yan Wang
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
| | - Bingbing Li
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ce Yang
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Weiwei Zhu
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qingshan Chen
- College of Agriculture, Northeast Agricultural University, Harbin, 150030, China
| | - Chaoying He
- State Key Laboratory of Plant Diversity and Specialty Crops/State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
- The Innovative Academy of Seed Design, the Chinese Academy of Sciences, Beijing, 100101, China
| |
Collapse
|
5
|
Balbinott N, Margis R. The many faces of lysine acylation in proteins: Phytohormones as unexplored substrates. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2023; 336:111866. [PMID: 37714383 DOI: 10.1016/j.plantsci.2023.111866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/04/2023] [Accepted: 09/12/2023] [Indexed: 09/17/2023]
Abstract
Protein post-translational modification (PTM) is a ubiquitous process that occurs in most proteins. Lysine residues containing an ε-amino group are recognized as hotspots for the addition of different chemical groups. Lysine acetylation, extensively studied in histones, serves as an epigenetic hallmark capable of promoting changes in chromatin structure and availability. Acyl groups derived from molecules involved in carbohydrate and lipid metabolisms, such as lactate, succinate and hydroxybutyrate, were identified as lysine modifications of histones and other proteins. Lysine-acyltransferases do not exhibit significant substrate specificity concerning acyl donors. Furthermore, plant hormones harboring acyl groups often form conjugates with free amino acids to regulate their activity and function during plant physiological processes and responses, a process mediated by GH3 enzymes. Besides forming low-molecular weight conjugates, auxins have been shown to covalently modify proteins in bean seeds. Aside from auxins, other phytohormones with acyl groups are unexplored potential substrates for post-translational acylation of proteins. Using MS data searches, we revealed various proteins with lysine residues linked to auxin, abscisic acid, gibberellic acid, jasmonic acid, and salicylic acid. These findings raise compelling questions about the ability of plant hormones harboring carboxyl groups to serve as new candidates for protein acylation and acting in protein PTM and modulation.
Collapse
Affiliation(s)
- Natalia Balbinott
- Programa de Pós-graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Rogerio Margis
- Programa de Pós-graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil; Programa de Pós-graduação em Biologia Celular e Molecular (PPGBCM), Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil; Departamento de Biofísica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.
| |
Collapse
|
6
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
7
|
N’Guessan A, Kailasam S, Mostefai F, Poujol R, Grenier JC, Ismailova N, Contini P, De Palma R, Haber C, Stadler V, Bourque G, Hussin JG, Shapiro BJ, Fritz JH, Piccirillo CA. Selection for immune evasion in SARS-CoV-2 revealed by high-resolution epitope mapping and sequence analysis. iScience 2023; 26:107394. [PMID: 37599818 PMCID: PMC10433132 DOI: 10.1016/j.isci.2023.107394] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 02/10/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Here, we exploit a deep serological profiling strategy coupled with an integrated, computational framework for the analysis of SARS-CoV-2 humoral immune responses. Applying a high-density peptide array (HDPA) spanning the entire proteomes of SARS-CoV-2 and endemic human coronaviruses allowed identification of B cell epitopes and relate them to their evolutionary and structural properties. We identify hotspots of pre-existing immunity and identify cross-reactive epitopes that contribute to increasing the overall humoral immune response to SARS-CoV-2. Using a public dataset of over 38,000 viral genomes from the early phase of the pandemic, capturing both inter- and within-host genetic viral diversity, we determined the evolutionary profile of epitopes and the differences across proteins, waves, and SARS-CoV-2 variants. Lastly, we show that mutations in spike and nucleocapsid epitopes are under stronger selection between than within patients, suggesting that most of the selective pressure for immune evasion occurs upon transmission between hosts.
Collapse
Affiliation(s)
- Arnaud N’Guessan
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill Genome Centre, McGill University, Montréal, QC, Canada
| | - Senthilkumar Kailasam
- Canadian Center for Computational Genomics, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Fatima Mostefai
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
| | - Raphaël Poujol
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
| | | | - Nailya Ismailova
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Paola Contini
- Department of Internal Medicine, University of Genoa and IRCCS IST-Ospedale San Martino, Genoa, Italy
| | - Raffaele De Palma
- Department of Internal Medicine, University of Genoa and IRCCS IST-Ospedale San Martino, Genoa, Italy
| | | | | | - Guillaume Bourque
- Canadian Center for Computational Genomics, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Julie G. Hussin
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
- Département de Médecine, Université de Montréal, Montréal, QC, Canada
| | - B. Jesse Shapiro
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill Genome Centre, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Jörg H. Fritz
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Ciriaco A. Piccirillo
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Infectious Diseases and Immunity in Global Health Program of the Research Institute of McGill Health Center, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| |
Collapse
|
8
|
Omachi Y, Saito N, Furusawa C. Rare-event sampling analysis uncovers the fitness landscape of the genetic code. PLoS Comput Biol 2023; 19:e1011034. [PMID: 37068098 PMCID: PMC10138212 DOI: 10.1371/journal.pcbi.1011034] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/27/2023] [Accepted: 03/16/2023] [Indexed: 04/18/2023] Open
Abstract
The genetic code refers to a rule that maps 64 codons to 20 amino acids. Nearly all organisms, with few exceptions, share the same genetic code, the standard genetic code (SGC). While it remains unclear why this universal code has arisen and been maintained during evolution, it may have been preserved under selection pressure. Theoretical studies comparing the SGC and numerically created hypothetical random genetic codes have suggested that the SGC has been subject to strong selection pressure for being robust against translation errors. However, these prior studies have searched for random genetic codes in only a small subspace of the possible code space due to limitations in computation time. Thus, how the genetic code has evolved, and the characteristics of the genetic code fitness landscape, remain unclear. By applying multicanonical Monte Carlo, an efficient rare-event sampling method, we efficiently sampled random codes from a much broader random ensemble of genetic codes than in previous studies, estimating that only one out of every 1020 random codes is more robust than the SGC. This estimate is significantly smaller than the previous estimate, one in a million. We also characterized the fitness landscape of the genetic code that has four major fitness peaks, one of which includes the SGC. Furthermore, genetic algorithm analysis revealed that evolution under such a multi-peaked fitness landscape could be strongly biased toward a narrow peak, in an evolutionary path-dependent manner.
Collapse
Affiliation(s)
- Yuji Omachi
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
| | - Nen Saito
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima City, Hiroshima, Japan
- Exploratory Research Center on Life and Living Systems, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
| | - Chikara Furusawa
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
- Center for Biosystems Dynamics Research, RIKEN, Suita, Osaka, Japan
| |
Collapse
|
9
|
Property based analysis: Optimality of RNY comma-free code versus circular code (X) after frameshift errors. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
11
|
Kosicki M, Allen F, Steward F, Tomberg K, Pan Y, Bradley A. Cas9-induced large deletions and small indels are controlled in a convergent fashion. Nat Commun 2022; 13:3422. [PMID: 35701408 PMCID: PMC9197861 DOI: 10.1038/s41467-022-30480-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 04/28/2022] [Indexed: 11/09/2022] Open
Abstract
Repair of Cas9-induced double-stranded breaks results primarily in formation of small insertions and deletions (indels), but can also cause potentially harmful large deletions. While mechanisms leading to the creation of small indels are relatively well understood, very little is known about the origins of large deletions. Using a library of clonal NGS-validated mouse embryonic stem cells deficient for 32 DNA repair genes, we have shown that large deletion frequency increases in cells impaired for non-homologous end joining and decreases in cells deficient for the central resection gene Nbn and the microhomology-mediated end joining gene Polq. Across deficient clones, increase in large deletion frequency was closely correlated with the increase in the extent of microhomology and the size of small indels, implying a continuity of repair processes across different genomic scales. Furthermore, by targeting diverse genomic sites, we identified examples of repair processes that were highly locus-specific, discovering a role for exonuclease Trex1. Finally, we present evidence that indel sizes increase with the overall efficiency of Cas9 mutagenesis. These findings may have impact on both basic research and clinical use of CRISPR-Cas9, in particular in conjunction with repair pathway modulation. CRISPR/Cas9 system has revolutionized science and therapy, but DNA damage it causes often goes beyond the desired ’precision editing’. Here, the authors identify general and target specific DNA repair pathways responsible for unwanted mutagenesis.
Collapse
Affiliation(s)
| | | | - Frances Steward
- The Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Department of Medicine, University of Cambridge, Cambridge, UK
| | - Kärt Tomberg
- The Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Department of Medicine, University of Cambridge, Cambridge, UK
| | - Yangyang Pan
- The Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Department of Medicine, University of Cambridge, Cambridge, UK
| | - Allan Bradley
- The Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Department of Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
12
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
|
13
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes. Two novel, very long, overlapping genes were found in Pseudomonas aeruginosa Both overlapping genes, olg1 and olg2, are transcribed, translated, and regulated Mass spectrometry verifies translation of the overlapping and their mother genes Both overlapping genes are taxonomically restricted, but under purifying selection
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany.,Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
14
|
Biba D, Klink G, Bazykin G. Pairs of mutually compensatory frameshifting mutations contribute to protein evolution. Mol Biol Evol 2022; 39:6524633. [PMID: 35137193 PMCID: PMC8935012 DOI: 10.1093/molbev/msac031] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Insertions and deletions of lengths not divisible by 3 in protein-coding sequences cause frameshifts that usually induce premature stop codons and may carry a high fitness cost. However, this cost can be partially offset by a second compensatory indel restoring the reading frame. The role of such pairs of compensatory frameshifting mutations (pCFMs) in evolution has not been studied systematically. Here, we use whole-genome alignments of protein-coding genes of 100 vertebrate species, and of 122 insect species, studying the prevalence of pCFMs in their divergence. We detect a total of 624 candidate pCFM genes; six of them pass stringent quality filtering, including three human genes: RAB36, ARHGAP6, and NCR3LG1. In some instances, amino acid substitutions closely predating or following pCFMs restored the biochemical similarity of the frameshifted segment to the ancestral amino acid sequence, possibly reducing or negating the fitness cost of the pCFM. Typically, however, the biochemical similarity of the frameshifted sequence to the ancestral one was not higher than the similarity of a random sequence of a protein-coding gene to its frameshifted version, indicating that pCFMs can uncover radically novel regions of protein space. In total, pCFMs represent an appreciable and previously overlooked source of novel variation in amino acid sequences.
Collapse
Affiliation(s)
- Dmitry Biba
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia - Moscow, Oblast
| | - Galya Klink
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, 127051, Russia
| | - Georgii Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia - Moscow, Oblast.,Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, 127051, Russia
| |
Collapse
|
15
|
Papadopoulos C, Chevrollier N, Lopes A. Exploring the Peptide Potential of Genomes. Methods Mol Biol 2022; 2405:63-82. [PMID: 35298808 DOI: 10.1007/978-1-0716-1855-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent studies attribute a central role to the noncoding genome in the emergence of novel genes. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer to the organisms a vast reservoir of novel peptides. Although the majority of these peptides are anticipated as deleterious or neutral, and thereby expected to be degraded right away or short-lived in evolutionary history, some of them can confer an advantage to the organism. The latter can be further subjected to natural selection and be established as novel genes. In any case, characterizing the structural properties of these pervasively translated peptides is crucial to understand (1) their impact on the cell and (2) how some of these peptides, derived from presumed noncoding regions, can give rise to structured and functional de novo proteins. Therefore, we present a protocol that aims to explore the potential of a genome to produce novel peptides. It consists in annotating all the open reading frames (ORFs) of a genome (i.e., coding and noncoding ones) and characterizing the fold potential and other structural properties of their corresponding potential peptides. Here, we apply our protocol to a small genome and show how to apply it to very large genomes. Finally, we present a case study which aims to probe the fold potential of a set of 721 translated ORFs in mouse lncRNAs, identified with ribosome profiling experiments. Interestingly, we show that the distribution of their fold potential is different from that of the nontranslated lncRNAs and more generally from the other noncoding ORFs of the mouse.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Nicolas Chevrollier
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, Gif-sur-Yvette, cedex, France.
| |
Collapse
|
16
|
Abstract
Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.
Collapse
|
17
|
Hagemeijer YP, Guryev V, Horvatovich P. Accurate Prediction of Protein Sequences for Proteogenomics Data Integration. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2420:233-260. [PMID: 34905178 DOI: 10.1007/978-1-0716-1936-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This book chapter discusses proteogenomics data integration and provides an overview into the different omics layer involved in defining the proteome of a living organism. Various aspects of genome variability affecting either the sequence or abundance level of proteins are discussed in this book chapter, such as the effect of single-nucleotide variants or larger genomic structural variants on the proteome. Next, various sequencing technologies are introduced and discussed from a proteogenomics data integration perspective such as those providing short- and long-read sequencing and listing their respective advantages and shortcomings for accurate protein variant prediction using genomic/transcriptomics sequencing data. Finally, the various bioinformatics tools used to process and analyze DNA/RNA sequencing data are discussed with the ultimate goal of obtaining accurately predicted sample-specific protein sequences that can be used as a drop-in replacement in existing approaches for peptide and protein identification using popular database search engines such as MSFragger, SearchGUI/PeptideShaker.
Collapse
Affiliation(s)
- Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.,European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.
| |
Collapse
|
18
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
19
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
20
|
Abstract
Selection for resource conservation can shape the coding sequences of organisms living in nutrient-limited environments. Recently, it was proposed that selection for resource conservation, specifically for nitrogen and carbon content, has also shaped the structure of the standard genetic code, such that the missense mutations the code allows tend to cause small increases in the number of nitrogen and carbon atoms in amino acids. Moreover, it was proposed that this optimization is not confounded by known optimizations of the standard genetic code, such as for polar requirement or hydropathy. We challenge these claims. We show the proposed optimization for nitrogen conservation is highly sensitive to choice of null model and the proposed optimization for carbon conservation is confounded by the known conservative nature of the standard genetic code with respect to the molecular volume of amino acids. There is therefore little evidence the standard genetic code is optimized for resource conservation. We discuss our findings in the context of null models of the standard genetic code.
Collapse
Affiliation(s)
- Hana Rozhoňová
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Quartier UNIL-Sorge, Lausanne, Switzerland
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Quartier UNIL-Sorge, Lausanne, Switzerland
| |
Collapse
|
21
|
Abstract
The standard genetic code (SGC) has been extensively analyzed for the biological ramifications of its nonrandom structure. For instance, mismatch errors due to point mutation or mistranslation have an overall smaller effect on the amino acid polar requirement under the SGC than under random genetic codes (RGCs). A similar observation was recently made for frameshift errors, prompting the assertion that the SGC has been shaped by natural selection for frameshift-robustness-conservation of certain amino acid properties upon a frameshift mutation or translational frameshift. However, frameshift-robustness confers no benefit because frameshifts usually create premature stop codons that cause nonsense-mediated mRNA decay or production of nonfunctional truncated proteins. We here propose that the frameshift-robustness of the SGC is a byproduct of its mismatch-robustness. Of 564 amino acid properties considered, the SGC exhibits mismatch-robustness in 93-133 properties and frameshift-robustness in 55 properties, respectively, and that the latter is largely a subset of the former. For each of the 564 real and 564 randomly constructed fake properties of amino acids, there is a positive correlation between mismatch-robustness and frameshift-robustness across one million RGCs; this correlation arises because most amino acid changes resulting from a frameshift are also achievable by a mismatch error. Importantly, the SGC does not show significantly higher frameshift-robustness in any of the 55 properties than RGCs of comparable mismatch-robustness. These findings support that the frameshift-robustness of the SGC need not originate through direct selection and can instead be a site effect of its mismatch-robustness.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
22
|
Štambuk N, Konjevoda P, Pavan J. Antisense Peptide Technology for Diagnostic Tests and Bioengineering Research. Int J Mol Sci 2021; 22:9106. [PMID: 34502016 PMCID: PMC8431130 DOI: 10.3390/ijms22179106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 08/10/2021] [Accepted: 08/13/2021] [Indexed: 01/01/2023] Open
Abstract
Antisense peptide technology (APT) is based on a useful heuristic algorithm for rational peptide design. It was deduced from empirical observations that peptides consisting of complementary (sense and antisense) amino acids interact with higher probability and affinity than the randomly selected ones. This phenomenon is closely related to the structure of the standard genetic code table, and at the same time, is unrelated to the direction of its codon sequence translation. The concept of complementary peptide interaction is discussed, and its possible applications to diagnostic tests and bioengineering research are summarized. Problems and difficulties that may arise using APT are discussed, and possible solutions are proposed. The methodology was tested on the example of SARS-CoV-2. It is shown that the CABS-dock server accurately predicts the binding of antisense peptides to the SARS-CoV-2 receptor binding domain without requiring predefinition of the binding site. It is concluded that the benefits of APT outweigh the costs of random peptide screening and could lead to considerable savings in time and resources, especially if combined with other computational and immunochemical methods.
Collapse
Affiliation(s)
- Nikola Štambuk
- Center for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia
| | - Josip Pavan
- Department of Ophthalmology, University Hospital Dubrava, Avenija Gojka Šuška 6, HR-10000 Zagreb, Croatia
| |
Collapse
|
23
|
Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol 2021; 68:175-183. [PMID: 33567396 DOI: 10.1016/j.sbi.2020.11.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 01/05/2023]
Abstract
Comparative evolutionary genomics has revealed that novel protein coding genes can emerge randomly from non-coding DNA. While most of the myriad of transcripts which continuously emerge vanish rapidly, some attain regulatory regions, become translated and survive. More surprisingly, sequence properties of de novo proteins are almost indistinguishable from randomly obtained sequences, yet de novo proteins may gain functions and integrate into eukaryotic cellular networks quite easily. We here discuss current knowledge on de novo proteins, their structures, functions and evolution. Since the existence of de novo proteins seems at odds with decade-long attempts to construct proteins with novel structures and functions from scratch, we suggest that a better understanding of de novo protein evolution may fuel new strategies for protein design.
Collapse
|
24
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
25
|
Abstract
What were the physico-chemical forces that drove the origins of life? We discuss four major prebiotic ‘discoveries’: persistent sampling of chemical reaction space; sequence-encodable foldable catalysts; assembly of functional pathways; and encapsulation and heritability. We describe how a ‘proteins-first’ world gives plausible mechanisms. We note the importance of hydrophobic and polar compositions of matter in these advances.
Collapse
Affiliation(s)
- K A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA.,Department of Chemistry, Stony Brook University, Stony Brook, NY, USA.,Department Physics and Astronomy, Stony Brook University, Stony Brook, NY, USA
| | - L Agozzino
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
26
|
Nesterov-Mueller A, Popov R, Seligmann H. Combinatorial Fusion Rules to Describe Codon Assignment in the Standard Genetic Code. Life (Basel) 2020; 11:life11010004. [PMID: 33374866 PMCID: PMC7824455 DOI: 10.3390/life11010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/15/2020] [Accepted: 12/21/2020] [Indexed: 11/16/2022] Open
Abstract
We propose combinatorial fusion rules that describe the codon assignment in the standard genetic code simply and uniformly for all canonical amino acids. These rules become obvious if the origin of the standard genetic code is considered as a result of a fusion of four protocodes: Two dominant AU and GC protocodes and two recessive AU and GC protocodes. The biochemical meaning of the fusion rules consists of retaining the complementarity between cognate codons of the small hydrophobic amino acids and large charged or polar amino acids within the protocodes. The proto tRNAs were assembled in form of two kissing hairpins with 9-base and 10-base loops in the case of dominant protocodes and two 9-base loops in the case of recessive protocodes. The fusion rules reveal the connection between the stop codons, the non-canonical amino acids, pyrrolysine and selenocysteine, and deviations in the translation of mitochondria. Using fusion rules, we predicted the existence of additional amino acids that are essential for the development of the standard genetic code. The validity of the proposed partition of the genetic code into dominant and recessive protocodes is considered referring to state-of-the-art hypotheses. The formation of two aminoacyl-tRNA synthetase classes is compatible with four-protocode partition.
Collapse
Affiliation(s)
- Alexander Nesterov-Mueller
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- Correspondence:
| | - Roman Popov
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
| | - Hervé Seligmann
- Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany; (R.P.); (H.S.)
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
- Laboratory AGEIS EA 7407, Team Tools for e-GnosisMedical & LabcomCNRS/UGA/OrangeLabs Telecoms4Health, Faculty of Medicine, Université Grenoble Alpes, F-38700 La Tronche, France
| |
Collapse
|
27
|
Demongeot J, Moreira A, Seligmann H. Negative CG dinucleotide bias: An explanation based on feedback loops between Arginine codon assignments and theoretical minimal RNA rings. Bioessays 2020; 43:e2000071. [PMID: 33319381 DOI: 10.1002/bies.202000071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 01/05/2023]
Abstract
Theoretical minimal RNA rings are candidate primordial genes evolved for non-redundant coding of the genetic code's 22 coding signals (one codon per biogenic amino acid, a start and a stop codon) over the shortest possible length: 29520 22-nucleotide-long RNA rings solve this min-max constraint. Numerous RNA ring properties are reminiscent of natural genes. Here we present analyses showing that all RNA rings lack dinucleotide CG (a mutable, chemically instable dinucleotide coding for Arginine), bearing a resemblance to known CG-depleted genomes. CG in "incomplete" RNA rings (not coding for all coding signals, with only 3-12 nucleotides) gradually decreases towards CG absence in complete, 22-nucleotide-long RNA rings. Presumably, feedback loops during RNA ring growth during evolution (when amino acid assignment fixed the genetic code) assigned Arg to codons lacking CG (AGR) to avoid CG. Hence, as a chemical property of base pairs, CG mutability restructured the genetic code, thereby establishing itself as genetically encoded biological information.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France
| | - Andrés Moreira
- Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
28
|
Xu YC, Guo YL. Less Is More, Natural Loss-of-Function Mutation Is a Strategy for Adaptation. PLANT COMMUNICATIONS 2020; 1:100103. [PMID: 33367264 PMCID: PMC7743898 DOI: 10.1016/j.xplc.2020.100103] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 07/08/2020] [Accepted: 08/12/2020] [Indexed: 05/12/2023]
Abstract
Gene gain and loss are crucial factors that shape the evolutionary success of diverse organisms. In the past two decades, more attention has been paid to the significance of gene gain through gene duplication or de novo genes. However, gene loss through natural loss-of-function (LoF) mutations, which is prevalent in the genomes of diverse organisms, has been largely ignored. With the development of sequencing techniques, many genomes have been sequenced across diverse species and can be used to study the evolutionary patterns of gene loss. In this review, we summarize recent advances in research on various aspects of LoF mutations, including their identification, evolutionary dynamics in natural populations, and functional effects. In particular, we discuss how LoF mutations can provide insights into the minimum gene set (or the essential gene set) of an organism. Furthermore, we emphasize their potential impact on adaptation. At the genome level, although most LoF mutations are neutral or deleterious, at least some of them are under positive selection and may contribute to biodiversity and adaptation. Overall, we highlight the importance of natural LoF mutations as a robust framework for understanding biological questions in general.
Collapse
Affiliation(s)
- Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
29
|
Nelson CW, Ardern Z, Goldberg TL, Meng C, Kuo CH, Ludwig C, Kolokotronis SO, Wei X. Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic. eLife 2020; 9:e59633. [PMID: 33001029 PMCID: PMC7655111 DOI: 10.7554/elife.59633] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/30/2020] [Indexed: 12/11/2022] Open
Abstract
Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigens, Viral/biosynthesis
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Betacoronavirus/genetics
- Betacoronavirus/pathogenicity
- Betacoronavirus/physiology
- COVID-19
- China/epidemiology
- Chiroptera/virology
- Coronavirus/genetics
- Coronavirus Infections/epidemiology
- Coronavirus Infections/virology
- Epitopes/genetics
- Epitopes/immunology
- Europe/epidemiology
- Eutheria/virology
- Evolution, Molecular
- Gene Expression Regulation, Viral
- Genes, Overlapping
- Genes, Viral
- Genetic Variation
- Haplotypes/genetics
- Host Specificity/genetics
- Humans
- Models, Molecular
- Mutation
- Open Reading Frames/genetics
- Pandemics
- Phylogeny
- Pneumonia, Viral/epidemiology
- Pneumonia, Viral/virology
- Protein Biosynthesis
- Protein Conformation
- RNA, Viral/genetics
- SARS-CoV-2
- Sequence Alignment
- Sequence Homology, Nucleic Acid
- Viral Proteins/genetics
- Viral Proteins/immunology
Collapse
Affiliation(s)
- Chase W Nelson
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
| | - Zachary Ardern
- Chair for Microbial Ecology, Technical University of MunichFreisingGermany
| | - Tony L Goldberg
- Department of Pathobiological Sciences, University of Wisconsin-MadisonMadisonUnited States
- Global Health Institute, University of Wisconsin-MadisonMadisonUnited States
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Chen-Hao Kuo
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Sergios-Orestis Kolokotronis
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Institute for Genomic Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Division of Infectious Diseases, Department of Medicine, SUNY Downstate Health Sciences UniversityBrooklynUnited States
| | - Xinzhu Wei
- Departments of Integrative Biology and Statistics, University of California, BerkeleyBerkeleyUnited States
- Departments of Computer Science, Human Genetics, and Computational Medicine, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|