1
|
Abdullah-Zawawi MR, Govender N, Harun S, Muhammad NAN, Zainal Z, Mohamed-Hussein ZA. Multi-Omics Approaches and Resources for Systems-Level Gene Function Prediction in the Plant Kingdom. PLANTS (BASEL, SWITZERLAND) 2022; 11:2614. [PMID: 36235479 PMCID: PMC9573505 DOI: 10.3390/plants11192614] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/05/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
In higher plants, the complexity of a system and the components within and among species are rapidly dissected by omics technologies. Multi-omics datasets are integrated to infer and enable a comprehensive understanding of the life processes of organisms of interest. Further, growing open-source datasets coupled with the emergence of high-performance computing and development of computational tools for biological sciences have assisted in silico functional prediction of unknown genes, proteins and metabolites, otherwise known as uncharacterized. The systems biology approach includes data collection and filtration, system modelling, experimentation and the establishment of new hypotheses for experimental validation. Informatics technologies add meaningful sense to the output generated by complex bioinformatics algorithms, which are now freely available in a user-friendly graphical user interface. These resources accentuate gene function prediction at a relatively minimal cost and effort. Herein, we present a comprehensive view of relevant approaches available for system-level gene function prediction in the plant kingdom. Together, the most recent applications and sought-after principles for gene mining are discussed to benefit the plant research community. A realistic tabulation of plant genomic resources is included for a less laborious and accurate candidate gene discovery in basic plant research and improvement strategies.
Collapse
Affiliation(s)
- Muhammad-Redha Abdullah-Zawawi
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, Kuala Lumpur 56000, Malaysia
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| | - Nisha Govender
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| | - Sarahani Harun
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| | - Zamri Zainal
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of System Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia
| |
Collapse
|
2
|
Wiltschi B, Cernava T, Dennig A, Galindo Casas M, Geier M, Gruber S, Haberbauer M, Heidinger P, Herrero Acero E, Kratzer R, Luley-Goedl C, Müller CA, Pitzer J, Ribitsch D, Sauer M, Schmölzer K, Schnitzhofer W, Sensen CW, Soh J, Steiner K, Winkler CK, Winkler M, Wriessnegger T. Enzymes revolutionize the bioproduction of value-added compounds: From enzyme discovery to special applications. Biotechnol Adv 2020; 40:107520. [DOI: 10.1016/j.biotechadv.2020.107520] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 10/18/2019] [Accepted: 01/13/2020] [Indexed: 12/11/2022]
|
3
|
Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics 2020; 21:293. [PMID: 32272892 PMCID: PMC7147072 DOI: 10.1186/s12864-020-6707-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/30/2020] [Indexed: 02/02/2023] Open
Abstract
Background The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. Results We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. Conclusions The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Anne Jeannin-Girardon
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
4
|
Cook DE, Valle-Inclan JE, Pajoro A, Rovenich H, Thomma BP, Faino L. Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing. PLANT PHYSIOLOGY 2019; 179:38-54. [PMID: 30401722 PMCID: PMC6324239 DOI: 10.1104/pp.18.00848] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 10/19/2018] [Indexed: 05/16/2023]
Abstract
Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.
Collapse
Affiliation(s)
- David E. Cook
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Jose Espejo Valle-Inclan
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Alice Pajoro
- Laboratory of Molecular Biology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Hanna Rovenich
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Bart P.H.J. Thomma
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
- Author for contact:
| | - Luigi Faino
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| |
Collapse
|
5
|
Park SG, Ryu D, Lee H, Ryu H, Ahn YJ, Yoo SI, Ko J, Hong CP. TaF: a web platform for taxonomic profile-based fungal gene prediction. Genes Genomics 2018; 41:337-342. [PMID: 30456524 DOI: 10.1007/s13258-018-0766-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 11/13/2018] [Indexed: 10/27/2022]
Abstract
INTRODUCTION The accurate prediction and annotation of gene structures from the genome sequence of an organism enable genome-wide functional analyses to obtain insight into the biological properties of an organism. OBJECTIVES We recently developed a highly accurate filamentous fungal gene prediction pipeline and web platform called TaF. TaF is a homology-based gene predictor employing large-scale taxonomic profiling to search for close relatives in genome queries. METHODS TaF pipeline consists of four processing steps; (1) taxonomic profiling to search for close relatives to query, (2) generation of hints for determining exon-intron boundaries from orthologous protein sequence data of the profiled species, (3) gene prediction by combination of ab inito and evidence-based prediction methods, and (4) homology search for gene models. RESULTS TaF generates extrinsic evidence that suggests possible exon-intron boundaries based on orthologous protein sequence data, thus reducing false-positive predictions of gene structure based on distantly related orthologs data. In particular, the gene prediction method using taxonomic profiling shows very high accuracy, including high sensitivity and specificity for gene models, suggesting a new approach for homology-based gene prediction from newly sequenced or uncharacterized fungal genomes, with the potential to improve the quality of gene prediction. CONCLUSION TaF will be a useful tool for fungal genome-wide analyses, including the identification of targeted genes associated with a trait, transcriptome profiling, comparative genomics, and evolutionary analysis.
Collapse
Affiliation(s)
- Sin-Gi Park
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea
| | - DongSung Ryu
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea
| | - Hyunsung Lee
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea
| | - Hojin Ryu
- Department of Biology, Chungbuk National University, Cheongju, 28644, Republic of Korea
| | - Yong Ju Ahn
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea
| | - Seung Il Yoo
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea
| | - Junsu Ko
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea.
| | - Chang Pyo Hong
- TheragenEtex Bio Institute, Suwon, 16229, Republic of Korea.
| |
Collapse
|
6
|
Laothanachareon T, Tamayo-Ramos JA, Nijsse B, Schaap PJ. Forward Genetics by Genome Sequencing Uncovers the Central Role of the Aspergillus niger goxB Locus in Hydrogen Peroxide Induced Glucose Oxidase Expression. Front Microbiol 2018; 9:2269. [PMID: 30319579 PMCID: PMC6165874 DOI: 10.3389/fmicb.2018.02269] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 09/05/2018] [Indexed: 01/09/2023] Open
Abstract
Aspergillus niger is an industrially important source for gluconic acid and glucose oxidase (GOx), a secreted commercially important flavoprotein which catalyses the oxidation of β-D-glucose by molecular oxygen to D-glucolactone and hydrogen peroxide. Expression of goxC, the GOx encoding gene and the concomitant two step conversion of glucose to gluconic acid requires oxygen and the presence of significant amounts of glucose in the medium and is optimally induced at pH 5.5. The molecular mechanisms underlying regulation of goxC expression are, however, still enigmatic. Genetic studies aimed at understanding GOx induction have indicated the involvement of at least seven complementation groups, for none of which the molecular basis has been resolved. In this study, a mapping-by-sequencing forward genetics approach was used to uncover the molecular role of the goxB locus in goxC expression. Using the Illumina and PacBio sequencing platforms a hybrid high quality draft genome assembly of laboratory strain N402 was obtained and used as a reference for mapping of genomic reads obtained from the derivative NW103:goxB mutant strain. The goxB locus encodes a thioredoxin reductase. A deletion of the encoding gene in the N402 parent strain led to a high constitutive expression level of the GOx and the lactonase encoding genes required for the two-step conversion of glucose in gluconic acid and of the catR gene encoding catalase R. This high constitutive level of expression was observed to be irrespective of the carbon source and oxidative stress applied. A model clarifying the role of GoxB in the regulation of the expression of goxC involving hydrogen peroxide as second messenger is presented.
Collapse
Affiliation(s)
- Thanaporn Laothanachareon
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands.,Enzyme Technology Laboratory, National Center for Genetic Engineering and Biotechnology, Pathumthani, Thailand
| | | | - Bart Nijsse
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
7
|
Reid I. Evaluating Programs for Predicting Genes and Transcripts with RNA-Seq Support in Fungal Genomes. Methods Mol Biol 2018; 1775:209-227. [PMID: 29876820 DOI: 10.1007/978-1-4939-7804-5_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The steps needed to computationally predict genes and transcripts in fungal genomes with support from RNA-Seq data are described in detail for three prediction programs: CodingQuarry, BRAKER1, and Harfang. These programs predicted from 86% to 92% (Harfang) of the genes in a manually curated reference set for Aspergillus niger strain NRRL3. Genes with little or no RNA-Seq read coverage were predicted less successfully than genes with adequate coverage.
Collapse
Affiliation(s)
- Ian Reid
- Centre for Structural and Functional Genomics, Concordia University, Montreal, QC, Canada.
| |
Collapse
|
8
|
Abstract
No genome sequencing project is complete without structural and functional annotation. Gene models and functional predictions for these models can be obtained relatively easily using computational methods, but they are prone to errors. We describe herein the steps we use to manually curate gene models and functionally annotate them. Our approach is to examine each gene model carefully, and improve its structure if necessary, using a comprehensive set of experimental and computational data as evidence. Then, functional predictions are assigned to the gene models based on conserved protein domains and sequence similarities. We use stringent sequence similarity cutoffs and reviewed sequence-database records as external sources for our annotations. By methodically choosing which evidence to use for each annotation, we minimize the risk of adopting and assigning false predictions to the gene models.
Collapse
Affiliation(s)
- Erin McDonnell
- Centre for Structural and Functional Genomics, Concordia University, Montreal, QC, Canada
| | - Kimchi Strasser
- Centre for Structural and Functional Genomics, Concordia University, Montreal, QC, Canada.
| | - Adrian Tsang
- Centre for Structural and Functional Genomics, Concordia University, Montreal, QC, Canada
| |
Collapse
|
9
|
Swart V, Crampton BG, Ridenour JB, Bluhm BH, Olivier NA, Meyer JJM, Berger DK. Complementation of CTB7 in the Maize Pathogen Cercospora zeina Overcomes the Lack of In Vitro Cercosporin Production. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2017; 30:710-724. [PMID: 28535078 DOI: 10.1094/mpmi-03-17-0054-r] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Gray leaf spot (GLS), caused by the sibling species Cercospora zeina or Cercospora zeae-maydis, is cited as one of the most important diseases threatening global maize production. C. zeina fails to produce cercosporin in vitro and, in most cases, causes large coalescing lesions during maize infection, a symptom generally absent from cercosporin-deficient mutants in other Cercospora spp. Here, we describe the C. zeina cercosporin toxin biosynthetic (CTB) gene cluster. The oxidoreductase gene CTB7 contained several insertions and deletions as compared with the C. zeae-maydis ortholog. We set out to determine whether complementing the defective CTB7 gene with the full-length gene from C. zeae-maydis could confer in vitro cercosporin production. C. zeina transformants containing C. zeae-maydis CTB7 were generated by Agrobacterium tumefaciens-mediated transformation and were evaluated for in vitro cercosporin production. When grown on nitrogen-limited medium in the light-conditions conducive to cercosporin production in other Cercospora spp.-one transformant accumulated a red pigment that was confirmed to be cercosporin by the KOH assay, thin-layer chromatography, and ultra performance liquid chromatography-quadrupole-time-of-flight mass spectrometry. Our results indicated that C. zeina has a defective CTB7, but all other necessary machinery required for synthesizing cercosporin-like molecules and, thus, C. zeina may produce a structural variant of cercosporin during maize infection.
Collapse
Affiliation(s)
- Velushka Swart
- 1 Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag x20, Hatfield 0028, South Africa
| | - Bridget G Crampton
- 1 Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag x20, Hatfield 0028, South Africa
| | - John B Ridenour
- 2 Department of Plant Pathology, University of Arkansas, Fayetteville, AR 72701, U.S.A.; and
| | - Burt H Bluhm
- 2 Department of Plant Pathology, University of Arkansas, Fayetteville, AR 72701, U.S.A.; and
| | - Nicholas A Olivier
- 1 Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag x20, Hatfield 0028, South Africa
| | | | - Dave K Berger
- 1 Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute, University of Pretoria, Private Bag x20, Hatfield 0028, South Africa
| |
Collapse
|
10
|
Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017; 18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. RESULTS We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). CONCLUSIONS Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| | - Tatiana V. Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA USA
| | - Michael Hogan
- Orion Genomics, 4041 Forest Park Avenue, St. Louis, MO 63108 USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| |
Collapse
|
11
|
Magnan C, Yu J, Chang I, Jahn E, Kanomata Y, Wu J, Zeller M, Oakes M, Baldi P, Sandmeyer S. Sequence Assembly of Yarrowia lipolytica Strain W29/CLIB89 Shows Transposable Element Diversity. PLoS One 2016; 11:e0162363. [PMID: 27603307 PMCID: PMC5014426 DOI: 10.1371/journal.pone.0162363] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 08/22/2016] [Indexed: 12/27/2022] Open
Abstract
Yarrowia lipolytica, an oleaginous yeast, is capable of accumulating significant cellular mass in lipid making it an important source of biosustainable hydrocarbon-based chemicals. In spite of a similar number of protein-coding genes to that in other Hemiascomycetes, the Y. lipolytica genome is almost double that of model yeasts. Despite its economic importance and several distinct strains in common use, an independent genome assembly exists for only one strain. We report here a de novo annotated assembly of the chromosomal genome of an industrially-relevant strain, W29/CLIB89, determined by hybrid next-generation sequencing. For the first time, each Y. lipolytica chromosome is represented by a single contig. The telomeric rDNA repeats were localized by Irys long-range genome mapping and one complete copy of the rDNA sequence is reported. Two large structural variants and retroelement differences with reference strain CLIB122 including a full-length, novel Ty3/Gypsy long terminal repeat (LTR) retrotransposon and multiple LTR-like sequences are described. Strikingly, several of these are adjacent to RNA polymerase III-transcribed genes, which are almost double in number in Y. lipolytica compared to other Hemiascomycetes. In addition to previously-reported dimeric RNA polymerase III-transcribed genes, tRNA pseudogenes were identified. Multiple full-length and truncated LINE elements are also present. Therefore, although identified transposons do not constitute a significant fraction of the Y. lipolytica genome, they could have played an active role in its evolution. Differences between the sequence of this strain and of the existing reference strain underscore the utility of an additional independent genome assembly for this economically important organism.
Collapse
Affiliation(s)
- Christophe Magnan
- Department of Computer Science, School of Computer Sciences, University of California Irvine, Irvine, California, United States of America
- Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, California, United States of America
| | - James Yu
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Ivan Chang
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Ethan Jahn
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Yuzo Kanomata
- Department of Computer Science, School of Computer Sciences, University of California Irvine, Irvine, California, United States of America
- Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, California, United States of America
| | - Jenny Wu
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Michael Zeller
- Department of Computer Science, School of Computer Sciences, University of California Irvine, Irvine, California, United States of America
| | - Melanie Oakes
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Pierre Baldi
- Department of Computer Science, School of Computer Sciences, University of California Irvine, Irvine, California, United States of America
- Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, California, United States of America
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
| | - Suzanne Sandmeyer
- Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, California, United States of America
- Department of Biological Chemistry, School of Medicine, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Testa AC, Oliver RP, Hane JK. OcculterCut: A Comprehensive Survey of AT-Rich Regions in Fungal Genomes. Genome Biol Evol 2016; 8:2044-64. [PMID: 27289099 PMCID: PMC4943192 DOI: 10.1093/gbe/evw121] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/03/2022] Open
Abstract
We present a novel method to measure the local GC-content bias in genomes and a survey of published fungal species. The method, enacted as "OcculterCut" (https://sourceforge.net/projects/occultercut, last accessed April 30, 2016), identified species containing distinct AT-rich regions. In most fungal taxa, AT-rich regions are a signature of repeat-induced point mutation (RIP), which targets repetitive DNA and decreases GC-content though the conversion of cytosine to thymine bases. RIP has in turn been identified as a driver of fungal genome evolution, as RIP mutations can also occur in single-copy genes neighboring repeat-rich regions. Over time RIP perpetuates "two speeds" of gene evolution in the GC-equilibrated and AT-rich regions of fungal genomes. In this study, genomes showing evidence of this process are found to be common, particularly among the Pezizomycotina. Further analysis highlighted differences in amino acid composition and putative functions of genes from these regions, supporting the hypothesis that these regions play an important role in fungal evolution. OcculterCut can also be used to identify genes undergoing RIP-assisted diversifying selection, such as small, secreted effector proteins that mediate host-microbe disease interactions.
Collapse
Affiliation(s)
- Alison C Testa
- Department of Environment & Agriculture, Centre for Crop and Disease Management, Curtin University, Perth, Australia
| | - Richard P Oliver
- Department of Environment & Agriculture, Centre for Crop and Disease Management, Curtin University, Perth, Australia
| | - James K Hane
- Department of Environment & Agriculture, Centre for Crop and Disease Management, Curtin University, Perth, Australia Curtin Institute for Computation, Curtin University, Perth, Australia
| |
Collapse
|
13
|
Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 2016; 32:767-9. [PMID: 26559507 PMCID: PMC6078167 DOI: 10.1093/bioinformatics/btv661] [Citation(s) in RCA: 636] [Impact Index Per Article: 79.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 10/02/2015] [Accepted: 10/26/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. RESULTS We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. AVAILABILITY AND IMPLEMENTATION BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ CONTACT katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Katharina J Hoff
- Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany
| | - Simone Lange
- Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany
| | - Alexandre Lomsadze
- Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA and
| | - Mark Borodovsky
- School of Computational Science and Engineering, Atlanta, GA 30332, USA, Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA and Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Mario Stanke
- Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany
| |
Collapse
|
14
|
Wibberg D, Rupp O, Blom J, Jelonek L, Kröber M, Verwaaijen B, Goesmann A, Albaum S, Grosch R, Pühler A, Schlüter A. Development of a Rhizoctonia solani AG1-IB Specific Gene Model Enables Comparative Genome Analyses between Phytopathogenic R. solani AG1-IA, AG1-IB, AG3 and AG8 Isolates. PLoS One 2015; 10:e0144769. [PMID: 26690577 PMCID: PMC4686921 DOI: 10.1371/journal.pone.0144769] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 11/23/2015] [Indexed: 12/22/2022] Open
Abstract
Rhizoctonia solani, a soil-born plant pathogenic basidiomycetous fungus, affects various economically important agricultural and horticultural crops. The draft genome sequence for the R. solani AG1-IB isolate 7/3/14 as well as a corresponding transcriptome dataset (Expressed Sequence Tags—ESTs) were established previously. Development of a specific R. solani AG1-IB gene model based on GMAP transcript mapping within the eukaryotic gene prediction platform AUGUSTUS allowed detection of new genes and provided insights into the gene structure of this fungus. In total, 12,616 genes were recognized in the genome of the AG1-IB isolate. Analysis of predicted genes by means of different bioinformatics tools revealed new genes whose products potentially are involved in degradation of plant cell wall components, melanin formation and synthesis of secondary metabolites. Comparative genome analyses between members of different R. solani anastomosis groups, namely AG1-IA, AG3 and AG8 and the newly annotated R. solani AG1-IB genome were performed within the comparative genomics platform EDGAR. It appeared that only 21 to 28% of all genes encoded in the draft genomes of the different strains were identified as core genes. Based on Average Nucleotide Identity (ANI) and Average Amino-acid Identity (AAI) analyses, considerable sequence differences between isolates representing different anastomosis groups were identified. However, R. solani isolates form a distinct cluster in relation to other fungi of the phylum Basidiomycota. The isolate representing AG1-IB encodes significant more genes featuring predictable functions in secondary metabolite production compared to other completely sequenced R. solani strains. The newly established R. solani AG1-IB 7/3/14 gene layout now provides a reliable basis for post-genomics studies.
Collapse
Affiliation(s)
- Daniel Wibberg
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
| | - Oliver Rupp
- Bioinformatics and Systems Biology, Gießen University, Gießen, Germany
| | - Jochen Blom
- Bioinformatics and Systems Biology, Gießen University, Gießen, Germany
| | - Lukas Jelonek
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
- Bioinformatics and Systems Biology, Gießen University, Gießen, Germany
| | - Magdalena Kröber
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
| | - Bart Verwaaijen
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
| | | | - Stefan Albaum
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
| | - Rita Grosch
- Leibniz-Institute of Vegetables and Ornamental Crops, Großbeeren, Germany
| | - Alfred Pühler
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
| | - Andreas Schlüter
- Institute for Genome Research and Systems Biology, CeBiTec, Bielefeld University, Bielefeld, Germany
- * E-mail:
| |
Collapse
|
15
|
Cairns TC, Studholme DJ, Talbot NJ, Haynes K. New and Improved Techniques for the Study of Pathogenic Fungi. Trends Microbiol 2015; 24:35-50. [PMID: 26549580 DOI: 10.1016/j.tim.2015.09.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Revised: 09/29/2015] [Accepted: 09/30/2015] [Indexed: 02/05/2023]
Abstract
Fungal pathogens pose serious threats to human, plant, and ecosystem health. Improved diagnostics and antifungal strategies are therefore urgently required. Here, we review recent developments in online bioinformatic tools and associated interactive data archives, which enable sophisticated comparative genomics and functional analysis of fungal pathogens in silico. Additionally, we highlight cutting-edge experimental techniques, including conditional expression systems, recyclable markers, RNA interference, genome editing, compound screens, infection models, and robotic automation, which are promising to revolutionize the study of both human and plant pathogenic fungi. These novel techniques will allow vital knowledge gaps to be addressed with regard to the evolution of virulence, host-pathogen interactions and antifungal drug therapies in both the clinic and agriculture. This, in turn, will enable delivery of improved diagnosis and durable disease-control strategies.
Collapse
Affiliation(s)
- Timothy C Cairns
- Institut für Biotechnologie, Technische Universität Berlin, Gustav-Meyer Allee 22, Berlin, Germany.
| | | | | | - Ken Haynes
- Biosciences, University of Exeter, Stocker Road, Exeter EX4 4QD, UK
| |
Collapse
|
16
|
Testa AC, Hane JK, Ellwood SR, Oliver RP. CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics 2015; 16:170. [PMID: 25887563 PMCID: PMC4363200 DOI: 10.1186/s12864-015-1344-4] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 02/13/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. RESULTS CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. CONCLUSIONS We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Collapse
Affiliation(s)
- Alison C Testa
- Centre for Crop and Disease Management, Department of Environment and Agriculture, School of Science, Curtin University, Bentley, WA, 6102, Australia. .,Postal address: Department of Environment and Agriculture Centre for Crop and Disease Management, GPO Box U1987, Perth, 6845, Western Australia.
| | - James K Hane
- Centre for Crop and Disease Management, Department of Environment and Agriculture, School of Science, Curtin University, Bentley, WA, 6102, Australia.
| | - Simon R Ellwood
- Centre for Crop and Disease Management, Department of Environment and Agriculture, School of Science, Curtin University, Bentley, WA, 6102, Australia.
| | - Richard P Oliver
- Centre for Crop and Disease Management, Department of Environment and Agriculture, School of Science, Curtin University, Bentley, WA, 6102, Australia.
| |
Collapse
|
17
|
Hoff KJ, Stanke M. Current methods for automated annotation of protein-coding genes. CURRENT OPINION IN INSECT SCIENCE 2015; 7:8-14. [PMID: 32846689 DOI: 10.1016/j.cois.2015.02.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 12/08/2014] [Accepted: 02/18/2015] [Indexed: 06/11/2023]
Abstract
We review software tools for gene prediction - the identification of protein-coding genes and their structure in genome sequences. The discussed approaches include methods based on RNA-Seq and current methods based on homology - comparative gene prediction and protein spliced alignments. Many methods require that their parameters are adjusted to the target species or its broader clade. These include ab initio gene finders, integrated approaches with ab initio components and some aligners. We also review current automatic methods for training for the common case that a bona fide training set of gene structures is not available before annotation.
Collapse
Affiliation(s)
- K J Hoff
- Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Str. 47, 17487 Greifswald, Germany
| | - M Stanke
- Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Str. 47, 17487 Greifswald, Germany
| |
Collapse
|
18
|
Sperschneider J, Williams AH, Hane JK, Singh KB, Taylor JM. Evaluation of Secretion Prediction Highlights Differing Approaches Needed for Oomycete and Fungal Effectors. FRONTIERS IN PLANT SCIENCE 2015; 6:1168. [PMID: 26779196 PMCID: PMC4688413 DOI: 10.3389/fpls.2015.01168] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 12/07/2015] [Indexed: 05/03/2023]
Abstract
The steadily increasing number of sequenced fungal and oomycete genomes has enabled detailed studies of how these eukaryotic microbes infect plants and cause devastating losses in food crops. During infection, fungal and oomycete pathogens secrete effector molecules which manipulate host plant cell processes to the pathogen's advantage. Proteinaceous effectors are synthesized intracellularly and must be externalized to interact with host cells. Computational prediction of secreted proteins from genomic sequences is an important technique to narrow down the candidate effector repertoire for subsequent experimental validation. In this study, we benchmark secretion prediction tools on experimentally validated fungal and oomycete effectors. We observe that for a set of fungal SwissProt protein sequences, SignalP 4 and the neural network predictors of SignalP 3 (D-score) and SignalP 2 perform best. For effector prediction in particular, the use of a sensitive method can be desirable to obtain the most complete candidate effector set. We show that the neural network predictors of SignalP 2 and 3, as well as TargetP were the most sensitive tools for fungal effector secretion prediction, whereas the hidden Markov model predictors of SignalP 2 and 3 were the most sensitive tools for oomycete effectors. Thus, previous versions of SignalP retain value for oomycete effector prediction, as the current version, SignalP 4, was unable to reliably predict the signal peptide of the oomycete Crinkler effectors in the test set. Our assessment of subcellular localization predictors shows that cytoplasmic effectors are often predicted as not extracellular. This limits the reliability of secretion predictions that depend on these tools. We present our assessment with a view to informing future pathogenomics studies and suggest revised pipelines for secretion prediction to obtain optimal effector predictions in fungi and oomycetes.
Collapse
Affiliation(s)
- Jana Sperschneider
- CSIRO Agriculture Flagship, Centre for Environment and Life SciencesPerth, WA, Australia
- *Correspondence: Jana Sperschneider
| | - Angela H. Williams
- CSIRO Agriculture Flagship, Centre for Environment and Life SciencesPerth, WA, Australia
- The Institute of Agriculture, The University of Western AustraliaCrawley, WA, Australia
| | - James K. Hane
- Department of Environment and Agriculture, CCDM Bioinformatics, Centre for Crop and Disease Management, Curtin UniversityPerth, WA, Australia
- Curtin Institute for Computation, Curtin UniversityPerth, WA, Australia
| | - Karam B. Singh
- CSIRO Agriculture Flagship, Centre for Environment and Life SciencesPerth, WA, Australia
- The Institute of Agriculture, The University of Western AustraliaCrawley, WA, Australia
| | | |
Collapse
|
19
|
Affiliation(s)
- Adrian Tsang
- Centre for Structural and Functional Genomics Concordia University 7141 Sherbrooke Street West Montreal, Quebec H4B1R6 Canada
| |
Collapse
|