1
|
Pfennig A, Lomsadze A, Borodovsky M. MgCod: Gene Prediction in Phage Genomes with Multiple Genetic Codes. J Mol Biol 2023; 435:168159. [PMID: 37244571 DOI: 10.1016/j.jmb.2023.168159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 05/19/2023] [Accepted: 05/21/2023] [Indexed: 05/29/2023]
Abstract
Massive sequencing of microbiomes has led to the discovery of a large number of phage genomes with intermittent stop codon recoding. We have developed a computational tool, MgCod, that identifies genomic regions (blocks) with distinct stop codon recoding simultaneously with the prediction of protein-coding regions. When MgCod was used to scan a large volume of human metagenomic contigs hundreds of viral contigs with intermittent stop codon recoding were revealed. Many of these contigs originated from genomes of known crAssphages. Further analyses had shown that intermittent recoding was associated with subtle patterns in the organization of protein-coding genes, such as 'single-coding' and 'dual-coding'. The dual-coding genes, clustered into blocks, could be translated by two alternative codes producing nearly identical proteins. It was observed that the dual-coded blocks were enriched with the early-stage phage genes, while the late-stage genes were residing in the single-coded blocks. MgCod can identify types of stop codon recoding in novel genomic sequences in parallel with gene prediction. It is available for download from https://github.com/gatech-genemark/MgCod.
Collapse
Affiliation(s)
- Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | - Alexandre Lomsadze
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | - Mark Borodovsky
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA; School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|
2
|
Lemire BD, Uppuluri P. Coding Sequence Insertions in Fungal Genomes are Intrinsically Disordered and can Impart Functionally-Important Properties on the Host Protein. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535715. [PMID: 37066283 PMCID: PMC10104129 DOI: 10.1101/2023.04.06.535715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Insertion and deletion mutations (indels) are important mechanisms of generating protein diversity. Indels in coding sequences are under considerable selective pressure to maintain reading frames and to preserve protein function, but once generated, indels provide raw material for the acquisition of new protein properties and functions. We reported recently that coding sequence insertions in the Candida albicans NDU1 protein, a mitochondrial protein involved in the assembly of the NADH:ubiquinone oxidoreductase are imperative for respiration, biofilm formation and pathogenesis. NDU1 inserts are specific to CTG-clade fungi, absent in human ortholog and successfully harnessed as drug targets. Here, we present the first comprehensive report investigating indels and clade-defining insertions (CDIs) in fungal proteomes. We investigated 80 ascomycete proteomes encompassing CTG clade species, the Saccharomycetaceae family, the Aspergillaceae family and the Herpotrichiellaceae (black yeasts) family. We identified over 30,000 insertions, 4,000 CDIs and 2,500 clade-defining deletions (CDDs). Insert sizes range from 1 to over 1,000 residues in length, while maximum deletion length is 19 residues. Inserts are strikingly over-represented in protein kinases, and excluded from structural domains and transmembrane segments. Inserts are predicted to be highly disordered. The amino acid compositions of the inserts are highly depleted in hydrophobic residues and enriched in polar residues. An indel in the Saccharomyces cerevisiae Sth1 protein, the catalytic subunit of the RSC (Remodel the Structure of Chromatin) complex is predicted to be disordered until it forms a ß-strand upon interaction. This interaction performs a vital role in RSC-mediated transcriptional regulation, thereby expanding protein function.
Collapse
Affiliation(s)
- Bernard D. Lemire
- Department of Biochemistry, University of Alberta, Edmonton, Canada (retired)
| | - Priya Uppuluri
- Institute for Infection and Immunity, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, USA
- David Geffen School of Medicine at UCLA, Los Angeles, California, USA
| |
Collapse
|
3
|
Shulgina Y, Eddy SR. Codetta: predicting the genetic code from nucleotide sequence. Bioinformatics 2023; 39:6895099. [PMID: 36511586 PMCID: PMC9825746 DOI: 10.1093/bioinformatics/btac802] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 11/10/2022] [Indexed: 12/15/2022] Open
Abstract
SUMMARY Codetta is a Python program for predicting the genetic code table of an organism from nucleotide sequences. Codetta can analyze an arbitrary nucleotide sequence and needs no sequence annotation or taxonomic placement. The most likely amino acid decoding for each of the 64 codons is inferred from alignments of profile hidden Markov models of conserved proteins to the input sequence. AVAILABILITY AND IMPLEMENTATION Codetta 2.0 is implemented as a Python 3 program for MacOS and Linux and is available from http://eddylab.org/software/codetta/codetta2.tar.gz and at http://github.com/kshulgina/codetta. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yekaterina Shulgina
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
4
|
Shulgina Y, Eddy SR. A computational screen for alternative genetic codes in over 250,000 genomes. eLife 2021; 10:71402. [PMID: 34751130 PMCID: PMC8629427 DOI: 10.7554/elife.71402] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/26/2021] [Indexed: 11/25/2022] Open
Abstract
The genetic code has been proposed to be a ‘frozen accident,’ but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force that likely helped drive these codons to low frequency and enable their reassignment. All life forms rely on a ‘code’ to translate their genetic information into proteins. This code relies on limited permutations of three nucleotides – the building blocks that form DNA and other types of genetic information. Each ‘triplet’ of nucleotides – or codon – encodes a specific amino acid, the basic component of proteins. Reading the sequence of codons in the right order will let the cell know which amino acid to assemble next on a growing protein. For instance, the codon CGG – formed of the nucleotides guanine (G) and cytosine (C) – codes for the amino acid arginine. From bacteria to humans, most life forms rely on the same genetic code. Yet certain organisms have evolved to use slightly different codes, where one or several codons have an altered meaning. To better understand how alternative genetic codes have evolved, Shulgina and Eddy set out to find more organisms featuring these altered codons, creating a new software called Codetta that can analyze the genome of a microorganism and predict the genetic code it uses. Codetta was then used to sift through the genetic information of 250,000 microorganisms. This was made possible by the sequencing, in recent years, of the genomes of hundreds of thousands of bacteria and other microorganisms – including many never studied before. These analyses revealed five groups of bacteria with alternative genetic codes, all of which had changes in the codons that code for arginine. Amongst these, four had genomes with a low proportion of guanine and cytosine nucleotides. This may have made some guanine and cytosine-rich arginine codons very rare in these organisms and, therefore, easier to be reassigned to encode another amino acid. The work by Shulgina and Eddy demonstrates that Codetta is a new, useful tool that scientists can use to understand how genetic codes evolve. In addition, it can also help to ensure the accuracy of widely used protein databases, which assume which genetic code organisms use to predict protein sequences from their genomes.
Collapse
Affiliation(s)
| | - Sean R Eddy
- Molecular & Cellular Biology, Harvard University, Cambridge, United States
| |
Collapse
|
5
|
Geijer C, Faria-Oliveira F, Moreno AD, Stenberg S, Mazurkewich S, Olsson L. Genomic and transcriptomic analysis of Candida intermedia reveals the genetic determinants for its xylose-converting capacity. BIOTECHNOLOGY FOR BIOFUELS 2020; 13:48. [PMID: 32190113 PMCID: PMC7068945 DOI: 10.1186/s13068-020-1663-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/21/2020] [Indexed: 05/05/2023]
Abstract
BACKGROUND An economically viable production of biofuels and biochemicals from lignocellulose requires microorganisms that can readily convert both the cellulosic and hemicellulosic fractions into product. The yeast Candida intermedia displays a high capacity for uptake and conversion of several lignocellulosic sugars including the abundant pentose d-xylose, an underutilized carbon source since most industrially relevant microorganisms cannot naturally ferment it. Thus, C. intermedia constitutes an important source of knowledge and genetic information that could be transferred to industrial microorganisms such as Saccharomyces cerevisiae to improve their capacity to ferment lignocellulose-derived xylose. RESULTS To understand the genetic determinants that underlie the metabolic properties of C. intermedia, we sequenced the genomes of both the in-house-isolated strain CBS 141442 and the reference strain PYCC 4715. De novo genome assembly and subsequent analysis revealed C. intermedia to be a haploid species belonging to the CTG clade of ascomycetous yeasts. The two strains have highly similar genome sizes and number of protein-encoding genes, but they differ on the chromosomal level due to numerous translocations of large and small genomic segments. The transcriptional profiles for CBS 141442 grown in medium with either high or low concentrations of glucose and xylose were determined through RNA-sequencing analysis, revealing distinct clusters of co-regulated genes in response to different specific growth rates, carbon sources and osmotic stress. Analysis of the genomic and transcriptomic data also identified multiple xylose reductases, one of which displayed dual NADH/NADPH co-factor specificity that likely plays an important role for co-factor recycling during xylose fermentation. CONCLUSIONS In the present study, we performed the first genomic and transcriptomic analysis of C. intermedia and identified several novel genes for conversion of xylose. Together the results provide insights into the mechanisms underlying saccharide utilization in C. intermedia and reveal potential target genes to aid in xylose fermentation in S. cerevisiae.
Collapse
Affiliation(s)
- Cecilia Geijer
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Fábio Faria-Oliveira
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Antonio D. Moreno
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Present Address: Biofuels Unit, Department of Energy, CIEMAT, Madrid, Spain
| | - Simon Stenberg
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
| | - Scott Mazurkewich
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Lisbeth Olsson
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
6
|
Trichez D, Steindorff AS, Soares CEVF, Formighieri EF, Almeida JRM. Physiological and comparative genomic analysis of new isolated yeasts Spathaspora sp. JA1 and Meyerozyma caribbica JA9 reveal insights into xylitol production. FEMS Yeast Res 2019; 19:5480466. [DOI: 10.1093/femsyr/foz034] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 04/25/2019] [Indexed: 12/30/2022] Open
Abstract
ABSTRACT
Xylitol is a five-carbon polyol of economic interest that can be produced by microbial xylose reduction from renewable resources. The current study sought to investigate the potential of two yeast strains, isolated from Brazilian Cerrado biome, in the production of xylitol as well as the genomic characteristics that may impact this process. Xylose conversion capacity by the new isolates Spathaspora sp. JA1 and Meyerozyma caribbica JA9 was evaluated and compared with control strains on xylose and sugarcane biomass hydrolysate. Among the evaluated strains, Spathaspora sp. JA1 was the strongest xylitol producer, reaching product yield and productivity as high as 0.74 g/g and 0.20 g/(L.h) on xylose, and 0.58 g/g and 0.44 g/(L.h) on non-detoxified hydrolysate. Genome sequences of Spathaspora sp. JA1 and M. caribbica JA9 were obtained and annotated. Comparative genomic analysis revealed that the predicted xylose metabolic pathway is conserved among the xylitol-producing yeasts Spathaspora sp. JA1, M. caribbica JA9 and Meyerozyma guilliermondii, but not in Spathaspora passalidarum, an efficient ethanol-producing yeast. Xylitol-producing yeasts showed strictly NADPH-dependent xylose reductase and NAD+-dependent xylitol-dehydrogenase activities. This imbalance of cofactors favors the high xylitol yield shown by Spathaspora sp. JA1, which is similar to the most efficient xylitol producers described so far.
Collapse
Affiliation(s)
- Débora Trichez
- Embrapa Agroenergia. Parque Estação Biológica, PqEB – W3 Norte Final, Postal code 70.770–901, Brasília-DF, Brazil
| | - Andrei S Steindorff
- Embrapa Agroenergia. Parque Estação Biológica, PqEB – W3 Norte Final, Postal code 70.770–901, Brasília-DF, Brazil
| | - Carlos E V F Soares
- Embrapa Agroenergia. Parque Estação Biológica, PqEB – W3 Norte Final, Postal code 70.770–901, Brasília-DF, Brazil
- Graduate Program in Chemical and Biological Technologies, Institute of Chemistry, University of Brasília, Campus Darcy Ribeiro, Postal code 70.910-900, Brasília-DF, Brazil
| | - Eduardo F Formighieri
- Embrapa Agroenergia. Parque Estação Biológica, PqEB – W3 Norte Final, Postal code 70.770–901, Brasília-DF, Brazil
| | - João R M Almeida
- Embrapa Agroenergia. Parque Estação Biológica, PqEB – W3 Norte Final, Postal code 70.770–901, Brasília-DF, Brazil
- Graduate Program in Chemical and Biological Technologies, Institute of Chemistry, University of Brasília, Campus Darcy Ribeiro, Postal code 70.910-900, Brasília-DF, Brazil
| |
Collapse
|
7
|
Noutahi E, Calderon V, Blanchette M, El-Mabrouk N, Lang BF. Rapid Genetic Code Evolution in Green Algal Mitochondrial Genomes. Mol Biol Evol 2019; 36:766-783. [PMID: 30698742 PMCID: PMC6551751 DOI: 10.1093/molbev/msz016] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.
Collapse
Affiliation(s)
- Emmanuel Noutahi
- Département d'Informatique et de Recherche opérationnelle (DIRO), Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| | - Virginie Calderon
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, McConnell Engineering Bldg., Montréal, QC H3A 0E9, Canada
- McGill Centre for Bioinformatics, McGill University, Montréal, QC, Canada
| | - Nadia El-Mabrouk
- Département d'Informatique et de Recherche opérationnelle (DIRO), Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| | - Bernd Franz Lang
- Département de Biochimie, Centre Robert Cedergren, Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| |
Collapse
|
8
|
Lee H, Han C, Lee HW, Park G, Jeon W, Ahn J, Lee H. Development of a promising microbial platform for the production of dicarboxylic acids from biorenewable resources. BIOTECHNOLOGY FOR BIOFUELS 2018; 11:310. [PMID: 30455739 PMCID: PMC6225622 DOI: 10.1186/s13068-018-1310-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 10/30/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND As a sustainable industrial process, the production of dicarboxylic acids (DCAs), used as precursors of polyamides, polyesters, perfumes, plasticizers, lubricants, and adhesives, from vegetable oil has continuously garnered interest. Although the yeast Candida tropicalis has been used as a host for DCA production, additional strains are continually investigated to meet productivity thresholds and industrial needs. In this regard, the yeast Wickerhamiella sorbophila, a potential candidate strain, has been screened. However, the lack of genetic and physiological information for this uncommon strain is an obstacle that merits further research. To overcome this limitation, we attempted to develop a method to facilitate genetic recombination in this strain and produce high amounts of DCAs from methyl laurate using engineered W. sorbophila. RESULTS In the current study, we first developed efficient genetic engineering tools for the industrial application of W. sorbophila. To increase homologous recombination (HR) efficiency during transformation, the cell cycle of the yeast was synchronized to the S/G2 phase using hydroxyurea. The HR efficiency at POX1 and POX2 loci increased from 56.3% and 41.7%, respectively, to 97.9% in both cases. The original HR efficiency at URA3 and ADE2 loci was nearly 0% during the early stationary and logarithmic phases of growth, and increased to 4.8% and 25.6%, respectively. We used the developed tools to construct W. sorbophila UHP4, in which β-oxidation was completely blocked. The strain produced 92.5 g/l of dodecanedioic acid (DDDA) from methyl laurate over 126 h in 5-l fed-batch fermentation, with a productivity of 0.83 g/l/h. CONCLUSIONS Wickerhamiella sorbophila UHP4 produced more DDDA methyl laurate than C. tropicalis. Hence, we demonstrated that W. sorbophila is a powerful microbial platform for vegetable oil-based DCA production. In addition, by using the developed genetic engineering tools, this emerging yeast could be used for the production of a variety of fatty acid derivatives, such as fatty alcohols, fatty aldehydes, and ω-hydroxy fatty acids.
Collapse
Affiliation(s)
- Heeseok Lee
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
- Department of Bioprocess Engineering, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113 Republic of Korea
| | - Changpyo Han
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
| | - Hyeok-Won Lee
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
| | - Gyuyeon Park
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
- Department of Bioprocess Engineering, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113 Republic of Korea
| | - Wooyoung Jeon
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
| | - Jungoh Ahn
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
| | - Hongweon Lee
- Biotechnology Process Engineering Center, Korean Research Institute of Bioscience and Biotechnology (KRIBB), 30 Yeongudanji-ro, Cheongwon-gu, Cheongju-si, Chungcheongbuk-do 28116 Republic of Korea
| |
Collapse
|
9
|
Noutahi E, Calderon V, Blanchette M, Lang FB, El-Mabrouk N. CoreTracker: accurate codon reassignment prediction, applied to mitochondrial genomes. Bioinformatics 2018; 33:3331-3339. [PMID: 28655158 DOI: 10.1093/bioinformatics/btx421] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 06/23/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Codon reassignments have been reported across all domains of life. With the increasing number of sequenced genomes, the development of systematic approaches for genetic code detection is essential for accurate downstream analyses. Three automated prediction tools exist so far: FACIL, GenDecoder and Bagheera; the last two respectively restricted to metazoan mitochondrial genomes and CUG reassignments in yeast nuclear genomes. These tools can only analyze a single genome at a time and are often not followed by a validation procedure, resulting in a high rate of false positives. Results We present CoreTracker, a new algorithm for the inference of sense-to-sense codon reassignments. CoreTracker identifies potential codon reassignments in a set of related genomes, then uses statistical evaluations and a random forest classifier to predict those that are the most likely to be correct. Predicted reassignments are then validated through a phylogeny-aware step that evaluates the impact of the new genetic code on the protein alignment. Handling simultaneously a set of genomes in a phylogenetic framework, allows tracing back the evolution of each reassignment, which provides information on its underlying mechanism. Applied to metazoan and yeast genomes, CoreTracker significantly outperforms existing methods on both precision and sensitivity. Availability and implementation CoreTracker is written in Python and available at https://github.com/UdeM-LBIT/CoreTracker. Contact mabrouk@iro.umontreal.ca. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emmanuel Noutahi
- Département d'Informatique et de Recherche Opérationnelle (DIRO), Université de Montréal, Montréal, QC CP 6128, Canada
| | - Virginie Calderon
- Département d'Informatique et de Recherche Opérationnelle (DIRO), Université de Montréal, Montréal, QC CP 6128, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, McConnell Engineering Bldg., Montréal, QC H3A 0E9, Canada
| | - Franz B Lang
- Département de Biochimie, Centre Robert Cedergren, Université de Montréal, Montréal, QC CP 6128, Canada
| | - Nadia El-Mabrouk
- Département d'Informatique et de Recherche Opérationnelle (DIRO), Université de Montréal, Montréal, QC CP 6128, Canada
| |
Collapse
|
10
|
Mühlhausen S, Schmitt HD, Pan KT, Plessmann U, Urlaub H, Hurst LD, Kollmar M. Endogenous Stochastic Decoding of the CUG Codon by Competing Ser- and Leu-tRNAs in Ascoidea asiatica. Curr Biol 2018; 28:2046-2057.e5. [PMID: 29910077 PMCID: PMC6041473 DOI: 10.1016/j.cub.2018.04.085] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Revised: 04/22/2018] [Accepted: 04/24/2018] [Indexed: 12/24/2022]
Abstract
Although the “universal” genetic code is now known not to be universal, and stop codons can have multiple meanings, one regularity remains, namely that for a given sense codon there is a unique translation. Examining CUG usage in yeasts that have transferred CUG away from leucine, we here report the first example of dual coding: Ascoidea asiatica stochastically encodes CUG as both serine and leucine in approximately equal proportions. This is deleterious, as evidenced by CUG codons being rare, never at conserved serine or leucine residues, and predominantly in lowly expressed genes. Related yeasts solve the problem by loss of function of one of the two tRNAs. This dual coding is consistent with the tRNA-loss-driven codon reassignment hypothesis, and provides a unique example of a proteome that cannot be deterministically predicted. Video Abstract
Ascoidea asiatica stochastically encodes CUG as leucine and serine It is the only known example of a proteome with non-deterministic features Stochastic encoding is caused by competing tRNALeu(CAG) and tRNASer(CAG) A. asiatica copes with stochastic encoding by avoiding CUG at key positions
Collapse
Affiliation(s)
- Stefanie Mühlhausen
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Hans Dieter Schmitt
- Department of Neurobiology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Kuan-Ting Pan
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Uwe Plessmann
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany; Bioanalytics Group, Department of Clinical Chemistry, University Medical Center Göttingen, Robert Koch Strasse 40, 37075 Göttingen, Germany
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany.
| |
Collapse
|
11
|
Dhami MK, Hartwig T, Fukami T. Genetic basis of priority effects: insights from nectar yeast. Proc Biol Sci 2017; 283:rspb.2016.1455. [PMID: 27708148 DOI: 10.1098/rspb.2016.1455] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 09/07/2016] [Indexed: 01/15/2023] Open
Abstract
Priority effects, in which the order of species arrival dictates community assembly, can have a major influence on species diversity, but the genetic basis of priority effects remains unknown. Here, we suggest that nitrogen scavenging genes previously considered responsible for starvation avoidance may drive priority effects by causing rapid resource depletion. Using single-molecule sequencing, we de novo assembled the genome of the nectar-colonizing yeast, Metschnikowia reukaufii, across eight scaffolds and complete mitochondrion, with gap-free coverage over gene spaces. We found a high rate of tandem gene duplication in this genome, enriched for nitrogen metabolism and transport. Both high-capacity amino acid importers, GAP1 and PUT4, present as tandem gene arrays, were highly expressed in synthetic nectar and regulated by the availability and quality of amino acids. In experiments with competitive nectar yeast, Candida rancensis, amino acid addition alleviated suppression of C. rancensis by early arrival of M. reukaufii, corroborating that amino acid scavenging may contribute to priority effects. Because niche pre-emption via rapid resource depletion may underlie priority effects in a broad range of microbial, plant and animal communities, nutrient scavenging genes like the ones we considered here may be broadly relevant to understanding priority effects.
Collapse
Affiliation(s)
- Manpreet K Dhami
- Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA 94305, USA
| | - Thomas Hartwig
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford, CA 94305, USA
| | - Tadashi Fukami
- Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Reinert K, Dadi TH, Ehrhardt M, Hauswedell H, Mehringer S, Rahn R, Kim J, Pockrandt C, Winkler J, Siragusa E, Urgese G, Weese D. The SeqAn C++ template library for efficient sequence analysis: A resource for programmers. J Biotechnol 2017; 261:157-168. [PMID: 28888961 DOI: 10.1016/j.jbiotec.2017.07.017] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Revised: 07/17/2017] [Accepted: 07/19/2017] [Indexed: 11/27/2022]
Abstract
BACKGROUND The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008). RESULTS The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. CONCLUSIONS We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.
Collapse
Affiliation(s)
- Knut Reinert
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany.
| | - Temesgen Hailemariam Dadi
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Marcel Ehrhardt
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Hannes Hauswedell
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Svenja Mehringer
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - René Rahn
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Jongkyu Kim
- Efficient Algorithms for -Omics Data, Max Planck Institute for Molecular Genetics, Ihnestrasse 62-73, 14195 Berlin, Germany
| | - Christopher Pockrandt
- Efficient Algorithms for -Omics Data, Max Planck Institute for Molecular Genetics, Ihnestrasse 62-73, 14195 Berlin, Germany
| | - Jörg Winkler
- Efficient Algorithms for -Omics Data, Max Planck Institute for Molecular Genetics, Ihnestrasse 62-73, 14195 Berlin, Germany
| | | | - Gianvito Urgese
- Department of Control and Computer Engineering, Politecnico di Torino, Italy
| | | |
Collapse
|
13
|
Abstract
We report here the draft genome sequence of the lipolytic yeast Candida aaseri SH-14, isolated from the compost of oil palm empty fruit bunches, and the identification of eight putative lipase genes. This genome information will provide the opportunity to produce potential lipases for a variety of industrial applications.
Collapse
|
14
|
Sharma C, Kumar N, Pandey R, Meis JF, Chowdhary A. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation. New Microbes New Infect 2016; 13:77-82. [PMID: 27617098 PMCID: PMC5006800 DOI: 10.1016/j.nmni.2016.07.003] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 06/27/2016] [Accepted: 07/01/2016] [Indexed: 11/26/2022] Open
Abstract
Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L).
Collapse
Affiliation(s)
- C Sharma
- Department of Medical Mycology, Vallabhbhai Patel Chest Institute, University of Delhi, Delhi, India
| | - N Kumar
- Wellcome Trust Sanger Institute, Hinxton, UK
| | - R Pandey
- CSIR Ayurgenomics Unit-TRISUTRA, Council of Scientific & Industrial Research-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India
| | - J F Meis
- Department of Medical Microbiology and Infectious Diseases, Canisius-Wilhelmina Hospital, Nijmegen, The Netherlands; Department of Medical Microbiology, Radboud UMC, Nijmegen, The Netherlands
| | - A Chowdhary
- Department of Medical Mycology, Vallabhbhai Patel Chest Institute, University of Delhi, Delhi, India
| |
Collapse
|