1
|
Zagorščak M, Zrimec J, Bleker C, Nolte N, Juteršek M, Ramšak Ž, Gruden K, Petek M. Evidence-based unification of potato gene models with the UniTato collaborative genome browser. FRONTIERS IN PLANT SCIENCE 2024; 15:1352253. [PMID: 38919818 PMCID: PMC11196761 DOI: 10.3389/fpls.2024.1352253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/20/2024] [Indexed: 06/27/2024]
Abstract
Potato (Solanum tuberosum) is the most popular tuber crop and a model organism. A variety of gene models for potato exist, and despite frequent updates, they are not unified. This hinders the comparison of gene models across versions, limits the ability to reuse experimental data without significant re-analysis, and leads to missing or wrongly annotated genes. Here, we unify the recent potato double monoploid v4 and v6 gene models by developing an automated merging protocol, resulting in a Unified poTato genome model (UniTato). We subsequently established an Apollo genome browser (unitato.nib.si) that enables public access to UniTato and further community-based curation. We demonstrate how the UniTato resource can help resolve problems with missing or misplaced genes and can be used to update or consolidate a wider set of gene models or genome information. The automated protocol, genome annotation files, and a comprehensive translation table are provided at github.com/NIB-SI/unitato.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Marko Petek
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| |
Collapse
|
2
|
Mota APZ, Dossa K, Lechaudel M, Cornet D, Mournet P, Santoni S, Lopez D, Chaïr H. Whole-genome sequencing and comparative genomics reveal candidate genes associated with quality traits in Dioscorea alata. BMC Genomics 2024; 25:248. [PMID: 38443859 PMCID: PMC10916269 DOI: 10.1186/s12864-024-10135-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Quality traits are essential determinants of consumer preferences. Dioscorea alata (Greater Yam), is a starchy tuber crop in tropical regions. However, a comprehensive understanding of the genetic basis underlying yam tuber quality remains elusive. To address this knowledge gap, we employed population genomics and candidate gene association approaches to unravel the genetic factors influencing the quality attributes of boiled yam. METHODS AND RESULTS Comparative genomics analysis of 45 plant species revealed numerous novel genes absent in the existing D. alata gene annotation. This approach, adding 48% more genes, significantly enhanced the functional annotation of three crucial metabolic pathways associated with boiled yam quality traits: pentose and glucuronate interconversions, starch and sucrose metabolism, and flavonoid biosynthesis. In addition, the whole-genome sequencing of 127 genotypes identified 27 genes under selection and 22 genes linked to texture, starch content, and color through a candidate gene association analysis. Notably, five genes involved in starch content and cell wall composition, including 1,3-beta Glucan synthase, β-amylase, and Pectin methyl esterase, were common to both approaches and their expression levels were assessed by transcriptomic data. CONCLUSIONS The analysis of the whole-genome of 127 genotypes of D. alata and the study of three specific pathways allowed the identification of important genes for tuber quality. Our findings provide insights into the genetic basis of yam quality traits and will help the enhancement of yam tuber quality through breeding programs.
Collapse
Affiliation(s)
- Ana Paula Zotta Mota
- UMR AGAP, CIRAD, 34398, Montpellier, France
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France
- Université Côte d'Azur, Institut Sophia Agrobiotech, INRAE, CNRS, Sophia Antipolis, PACA, 06903, France
| | - Komivi Dossa
- UMR AGAP, CIRAD, 34398, Montpellier, France
- CIRAD, UMR AGAP Institut, 97170, Petit Bourg, Guadeloupe, France
| | - Mathieu Lechaudel
- UMR Qualisud, CIRAD, F97130, Capesterre-Belle-Eau, Guadeloupe, France
- QualiSud, Université Montpellier, Institut Agro, CIRAD, Avignon Université, Université de La Réunion, 34398, Montpellier, France
| | - Denis Cornet
- UMR AGAP, CIRAD, 34398, Montpellier, France
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France
| | - Pierre Mournet
- UMR AGAP, CIRAD, 34398, Montpellier, France
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France
| | - Sylvain Santoni
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France
| | - David Lopez
- UMR AGAP, CIRAD, 34398, Montpellier, France.
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France.
| | - Hana Chaïr
- UMR AGAP, CIRAD, 34398, Montpellier, France.
- AGAP, Univ Montpellier, CIRAD, INRAe, Montpellier SupAgro, Montpellier, France.
| |
Collapse
|
3
|
Thoben C, Pucker B. Automatic annotation of the bHLH gene family in plants. BMC Genomics 2023; 24:780. [PMID: 38102570 PMCID: PMC10722790 DOI: 10.1186/s12864-023-09877-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/06/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND The bHLH transcription factor family is named after the basic helix-loop-helix (bHLH) domain that is a characteristic element of their members. Understanding the function and characteristics of this family is important for the examination of a wide range of functions. As the availability of genome sequences and transcriptome assemblies has increased significantly, the need for automated solutions that provide reliable functional annotations is emphasised. RESULTS A phylogenetic approach was adapted for the automatic identification and functional annotation of the bHLH transcription factor family. The bHLH_annotator, designed for the automated functional annotation of bHLHs, was implemented in Python3. Sequences of bHLHs described in literature were collected to represent the full diversity of bHLH sequences. Previously described orthologs form the basis for the functional annotation assignment to candidates which are also screened for bHLH-specific motifs. The pipeline was successfully deployed on the two Arabidopsis thaliana accessions Col-0 and Nd-1, the monocot species Dioscorea dumetorum, and a transcriptome assembly of Croton tiglium. Depending on the applied search parameters for the initial candidates in the pipeline, species-specific candidates or members of the bHLH family which experienced domain loss can be identified. CONCLUSIONS The bHLH_annotator allows a detailed and systematic investigation of the bHLH family in land plant species and classifies candidates based on bHLH-specific characteristics, which distinguishes the pipeline from other established functional annotation tools. This provides the basis for the functional annotation of the bHLH family in land plants and the systematic examination of a wide range of functions regulated by this transcription factor family.
Collapse
Affiliation(s)
- Corinna Thoben
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated, Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated, Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany.
| |
Collapse
|
4
|
Knoshaug EP, Sun P, Nag A, Nguyen H, Mattoon EM, Zhang N, Liu J, Chen C, Cheng J, Zhang R, St. John P, Umen J. Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis. PLANT DIRECT 2023; 7:e527. [PMID: 38044962 PMCID: PMC10690477 DOI: 10.1002/pld3.527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/03/2023] [Accepted: 08/11/2023] [Indexed: 12/05/2023]
Abstract
The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome-wide protein-coding gene annotation. A substantial fraction of protein-coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): Arabidopsis thaliana (eudicot), Setaria viridis (monocot), and Chlamydomonas reinhardtii (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high-value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.
Collapse
Affiliation(s)
- Eric P. Knoshaug
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Peipei Sun
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Ambarish Nag
- Computational Sciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - Huong Nguyen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil ScienceTexas Tech UniversityLubbockTexasUSA
| | - Erin M. Mattoon
- Donald Danforth Plant Science CenterSt. LouisMOUSA
- Plant and Microbial Biosciences Program, Division of Biology and Biomedical SciencesWashington University in Saint LouisSt. LouisMissouriUSA
| | | | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Ru Zhang
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| | - Peter St. John
- Biosciences CenterNational Renewable Energy LaboratoryGoldenColoradoUSA
| | - James Umen
- Donald Danforth Plant Science CenterSt. LouisMOUSA
| |
Collapse
|
5
|
Großkinsky DK, Faure JD, Gibon Y, Haslam RP, Usadel B, Zanetti F, Jonak C. The potential of integrative phenomics to harness underutilized crops for improving stress resilience. FRONTIERS IN PLANT SCIENCE 2023; 14:1216337. [PMID: 37409292 PMCID: PMC10318926 DOI: 10.3389/fpls.2023.1216337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 06/08/2023] [Indexed: 07/07/2023]
Affiliation(s)
- Dominik K. Großkinsky
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Bioresources Unit, Tulln a. d. Donau, Austria
| | - Jean-Denis Faure
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin, Versailles, France
| | - Yves Gibon
- INRAE, Univ. Bordeaux, UMR BFP, Villenave d’Ornon, France
- Bordeaux Metabolome, INRAE, Univ. Bordeaux, Villenave d’Ornon, France
| | | | - Björn Usadel
- IBG-4 Bioinformatics, CEPLAS, Forschungszentrum, Jülich, Germany
- Biological Data Science, Heinrich Heine University, Universitätsstrasse 1, Düsseldorf, Germany
| | - Federica Zanetti
- Department of Agricultural and Food Sciences (DISTAL), Alma Mater Studiorum - Università di Bologna, Bologna, Italy
| | - Claudia Jonak
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Bioresources Unit, Tulln a. d. Donau, Austria
| |
Collapse
|
6
|
Transcriptome analysis of mulberry (Morus alba L.) leaves to identify differentially expressed genes associated with post-harvest shelf-life elongation. Sci Rep 2022; 12:18195. [PMID: 36307466 PMCID: PMC9616847 DOI: 10.1038/s41598-022-21828-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 10/04/2022] [Indexed: 12/31/2022] Open
Abstract
Present study deals with molecular expression patterns responsible for post-harvest shelf-life extension of mulberry leaves. Quantitative profiling showed retention of primary metabolite and accumulation of stress markers in NS7 and CO7 respectively. The leaf mRNA profiles was sequenced using the Illumina platform to identify DEGs. A total of 3413 DEGs were identified between the treatments. Annotation with Arabidopsis database has identified 1022 DEGs unigenes. STRING generated protein-protein interaction, identified 1013 DEGs nodes with p < 1.0e-16. KEGG classifier has identified genes and their participating biological processes. MCODE and BiNGO detected sub-networking and ontological enrichment, respectively at p ≤ 0.05. Genes associated with chloroplast architecture, photosynthesis, detoxifying ROS and RCS, and innate-immune response were significantly up-regulated, responsible for extending shelf-life in NS7. Loss of storage sucrose, enhanced activity of senescence-related hormones, accumulation of xenobiotics, and development of osmotic stress inside tissue system was the probable reason for tissue deterioration in CO7. qPCR validation of DEGs was in good agreement with RNA sequencing results, indicating the reliability of the sequencing platform. Present outcome provides a molecular insight regarding involvement of genes in self-life extension, which might help the sericulture industry to overcome their pre-existing problems related to landless farmers and larval feeding during monsoon.
Collapse
|
7
|
Alves S, Braga Â, Parreira D, Alhinho AT, Silva H, Ramos MJN, Costa MMR, Morais‐Cecílio L. Genome-wide identification, phylogeny, and gene duplication of the epigenetic regulators in Fagaceae. PHYSIOLOGIA PLANTARUM 2022; 174:e13788. [PMID: 36169620 PMCID: PMC9828519 DOI: 10.1111/ppl.13788] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/16/2022] [Accepted: 09/21/2022] [Indexed: 05/04/2023]
Abstract
Epigenetic regulators are proteins involved in controlling gene expression. Information about the epigenetic regulators within the Fagaceae, a relevant family of trees and shrubs of the northern hemisphere ecosystems, is scarce. With the intent to characterize these proteins in Fagaceae, we searched for orthologs of DNA methyltransferases (DNMTs) and demethylases (DDMEs) and Histone modifiers involved in acetylation (HATs), deacetylation (HDACs), methylation (HMTs), and demethylation (HDMTs) in Fagus, Quercus, and Castanea genera. Blast searches were performed in the available genomes, and freely available RNA-seq data were used to de novo assemble transcriptomes. We identified homologs of seven DNMTs, three DDMEs, six HATs, 11 HDACs, 32 HMTs, and 21 HDMTs proteins. Protein analysis showed that most of them have the putative characteristic domains found in these protein families, which suggests their conserved function. Additionally, to elucidate the evolutionary history of these genes within Fagaceae, paralogs were identified, and phylogenetic analyses were performed with DNA and histone modifiers. We detected duplication events in all species analyzed with higher frequency in Quercus and Castanea and discuss the evidence of transposable elements adjacent to paralogs and their involvement in gene duplication. The knowledge gathered from this work is a steppingstone to upcoming studies concerning epigenetic regulation in this economically important family of Fagaceae.
Collapse
Affiliation(s)
- Sofia Alves
- LEAF—Linking Landscape, Environment, Agriculture and FoodInstituto Superior de Agronomia, University of LisbonLisboaPortugal
| | - Ângelo Braga
- Instituto Superior de Agronomia, University of LisbonLisboaPortugal
| | - Denise Parreira
- Instituto Superior de Agronomia, University of LisbonLisboaPortugal
| | - Ana Teresa Alhinho
- Centre of Molecular and Environmental Biology (CBMA)University of MinhoBragaPortugal
| | - Helena Silva
- Centre of Molecular and Environmental Biology (CBMA)University of MinhoBragaPortugal
| | - Miguel Jesus Nunes Ramos
- LEAF—Linking Landscape, Environment, Agriculture and FoodInstituto Superior de Agronomia, University of LisbonLisboaPortugal
- Present address:
GenoMed, Diagnósticos de Medicina MolecularLisboaPortugal
| | | | - Leonor Morais‐Cecílio
- LEAF—Linking Landscape, Environment, Agriculture and FoodInstituto Superior de Agronomia, University of LisbonLisboaPortugal
| |
Collapse
|
8
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:6663924. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware , Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory , Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey , Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge , Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology , Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology , i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong , 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier , Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida , Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai , Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University , Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology , Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School , Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory , Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International , Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University , Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University , Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida , Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center , Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories , Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW , Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London , London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University , Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute , La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego , La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida , Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs , Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia , Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University , Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS , Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge , Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
9
|
Schenck CA, Busta L. Using interdisciplinary, phylogeny-guided approaches to understand the evolution of plant metabolism. PLANT MOLECULAR BIOLOGY 2022; 109:355-367. [PMID: 34816350 DOI: 10.1007/s11103-021-01220-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/05/2021] [Indexed: 06/13/2023]
Abstract
To cope with relentless environmental pressures, plants produce an arsenal of structurally diverse chemicals, often called specialized metabolites. These lineage-specific compounds are derived from the simple building blocks made by ubiquitous core metabolic pathways. Although the structures of many specialized metabolites are known, the underlying metabolic pathways and the evolutionary events that have shaped the plant chemical diversity landscape are only beginning to be understood. However, with the advent of multi-omics data sets and the relative ease of studying pathways in previously intractable non-model species, plant specialized metabolic pathways are now being systematically identified. These large datasets also provide a foundation for comparative, phylogeny-guided studies of plant metabolism. Comparisons of metabolic traits and features like chemical abundances, enzyme activities, or gene sequences from phylogenetically diverse plants provide insights into how metabolic pathways evolved. This review highlights the power of studying evolution through the lens of comparative biochemistry, particularly how placing metabolism into a phylogenetic context can help a researcher identify the metabolic innovations enabling the evolution of structurally diverse plant metabolites.
Collapse
Affiliation(s)
- Craig A Schenck
- Department of Biochemistry, University of Missouri, Columbia, MO, USA.
| | - Lucas Busta
- Department of Chemistry and Biochemistry, University of Minnesota Duluth, Duluth, MN, USA
| |
Collapse
|
10
|
Fritsche S, Rippel Salgado L, Boron AK, Hanning KR, Donaldson LA, Thorlby G. Transcriptional Regulation of Pine Male and Female Cone Initiation and Development: Key Players Identified Through Comparative Transcriptomics. Front Genet 2022; 13:815093. [PMID: 35368695 PMCID: PMC8971679 DOI: 10.3389/fgene.2022.815093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/24/2022] [Indexed: 11/24/2022] Open
Abstract
With long reproductive timescales, large complex genomes, and a lack of reliable reference genomes, understanding gene function in conifers is extremely challenging. Consequently, our understanding of which genetic factors influence the development of reproductive structures (cones) in monoecious conifers remains limited. Genes with inferred roles in conifer reproduction have mostly been identified through homology and phylogenetic reconstruction with their angiosperm counterparts. We used RNA-sequencing to generate transcriptomes of the early morphological stages of cone development in the conifer species Pinus densiflora and used these to gain a deeper insight into the transcriptional changes during male and female cone development. Paired-end Illumina sequencing was used to generate transcriptomes from non-reproductive tissue and male and female cones at four time points with a total of 382.82 Gbp of data generated. After assembly and stringent filtering, a total of 37,164 transcripts were retrieved, of which a third were functionally annotated using the Mercator plant pipeline. Differentially expressed gene (DEG) analysis resulted in the identification of 172,092 DEGs in the nine tissue types. This, alongside GO gene enrichment analyses, pinpointed transcripts putatively involved in conifer reproductive structure development, including co-orthologs of several angiosperm flowering genes and several that have not been previously reported in conifers. This study provides a comprehensive transcriptome resource for male and early female cone development in the gymnosperm species Pinus densiflora. Characterisation of this resource has allowed the identification of potential key players and thus provides valuable insights into the molecular regulation of reproductive structure development in monoecious conifers.
Collapse
Affiliation(s)
- Steffi Fritsche
- Forest Genetics and Biotechnology, Scion, Rotorua, New Zealand
| | - Leonardo Rippel Salgado
- Forest Genetics and Biotechnology, Scion, Rotorua, New Zealand
- Molecular and Digital Breeding, The New Zealand Institute for Plant and Food Research, Te Puke, New Zealand
| | | | | | | | - Glenn Thorlby
- Forest Genetics and Biotechnology, Scion, Rotorua, New Zealand
- *Correspondence: Glenn Thorlby,
| |
Collapse
|
11
|
Epigenome guided crop improvement: current progress and future opportunities. Emerg Top Life Sci 2022; 6:141-151. [PMID: 35072210 PMCID: PMC9023013 DOI: 10.1042/etls20210258] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/14/2021] [Accepted: 01/04/2022] [Indexed: 12/22/2022]
Abstract
Epigenomics encompasses a broad field of study, including the investigation of chromatin states, chromatin modifications and their impact on gene regulation; as well as the phenomena of epigenetic inheritance. The epigenome is a multi-modal layer of information superimposed on DNA sequences, instructing their usage in gene expression. As such, it is an emerging focus of efforts to improve crop performance. Broadly, this might be divided into avenues that leverage chromatin information to better annotate and decode plant genomes, and into complementary strategies that aim to identify and select for heritable epialleles that control crop traits independent of underlying genotype. In this review, we focus on the first approach, which we term ‘epigenome guided’ improvement. This encompasses the use of chromatin profiles to enhance our understanding of the composition and structure of complex crop genomes. We discuss the current progress and future prospects towards integrating this epigenomic information into crop improvement strategies; in particular for CRISPR/Cas9 gene editing and precision genome engineering. We also highlight some specific opportunities and challenges for grain and horticultural crops.
Collapse
|
12
|
Hurgobin B. Annotation of Protein-Coding Genes in Plant Genomes. Methods Mol Biol 2022; 2443:309-326. [PMID: 35037214 DOI: 10.1007/978-1-0716-2067-0_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Advances in next-generation sequencing technologies and the lower sequencing costs are paving the way to more plant genome sequencing, assembly, and annotation projects. While genome assembly is the first step toward elucidating the genome structure of a species, it is the annotation of the protein-coding genes that provide meaningful information to biologists. However, genome annotation is not a trivial task. Therefore, the aim of this chapter is to provide a detailed view of this important process, including tools and commands that can be used to carry out such a process.
Collapse
Affiliation(s)
- Bhavna Hurgobin
- La Trobe Institute for Agriculture and Food, Department of Animal, Plant and Soil Sciences, School of Life Sciences, AgriBio Building, La Trobe University, Bundoora, VIC, Australia.
- Australian Research Council Research Hub for Medicinal Agriculture, AgriBio Building, La Trobe University, Bundoora, VIC, Australia.
| |
Collapse
|
13
|
van den Bent I, Makrodimitris S, Reinders M. The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction. Evol Bioinform Online 2021; 17:11769343211062608. [PMID: 34880594 PMCID: PMC8647222 DOI: 10.1177/11769343211062608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 11/03/2021] [Indexed: 11/16/2022] Open
Abstract
Computationally annotating proteins with a molecular function is a difficult problem that is made even harder due to the limited amount of available labeled protein training data. Unsupervised protein embeddings partly circumvent this limitation by learning a universal protein representation from many unlabeled sequences. Such embeddings incorporate contextual information of amino acids, thereby modeling the underlying principles of protein sequences insensitive to the context of species. We used an existing pre-trained protein embedding method and subjected its molecular function prediction performance to detailed characterization, first to advance the understanding of protein language models, and second to determine areas of improvement. Then, we applied the model in a transfer learning task by training a function predictor based on the embeddings of annotated protein sequences of one training species and making predictions on the proteins of several test species with varying evolutionary distance. We show that this approach successfully generalizes knowledge about protein function from one eukaryotic species to various other species, outperforming both an alignment-based and a supervised-learning-based baseline. This implies that such a method could be effective for molecular function prediction in inadequately annotated species from understudied taxonomic kingdoms.
Collapse
Affiliation(s)
- Irene van den Bent
- Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
| | - Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands.,Keygene N.V., Wageningen, the Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
| |
Collapse
|
14
|
Transcriptome repository of North-Western Himalayan endangered medicinal herbs: a paramount approach illuminating molecular perspective of phytoactive molecules and secondary metabolism. Mol Genet Genomics 2021; 296:1177-1202. [PMID: 34557965 DOI: 10.1007/s00438-021-01821-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/12/2021] [Indexed: 01/23/2023]
Abstract
Medicinal plants of the North-Western Himalayan region are known for their unprecedented biodiversity and valuable secondary metabolites that are unique to this dynamic geo-climatic region. From ancient times these medicinal herbs have been used traditionally for their therapeutic potentials. But from the last 2 decades increasing pharmaceutical demand, illegal and unorganized trade of these medicinal plants have accelerated the rate of over-exploitation in a non-scientific manner. In addition, climate change and anthropogenic activities also affected their natural habitat and driving most of these endemic plant species to critically endangered that foresee peril of mass extinction from this eco-region. Hence there is an urgent need for developing alternative sustainable approaches and policies to utilize this natural bioresource ensuring simultaneous conservation. Hither, arise the advent of sequencing-based transcriptomic studies significantly contributes to better understand the background of important metabolic pathways and related genes/enzymes of high-value medicinal herbs, in the absence of genomic information. The use of comparative transcriptomics in conjunction with biochemical techniques in North-Western Himalayan medicinal plants has resulted in significant advances in the identification of the molecular players involved in the production of secondary metabolic pathways over the last decade. This information could be used to further engineer metabolic pathways and breeding programs, ultimately leading to the development of in vitro systems dedicated to the production of pharmaceutically important secondary metabolites at the industrial level. Collectively, successful adoption of these approaches can certainly ensure the sustainable utilization of Himalayan bioresource by reducing the pressure on the wild population of these critically endangered medicinal herbs. This review provides novel insight as a transcriptome-based bioresource repository for the understanding of important secondary metabolic pathways genes/enzymes and metabolism of endangered high-value North-Western Himalayan medicinal herbs, so that researchers across the globe can effectively utilize this information for devising effective strategies for the production of pharmaceutically important compounds and their scale-up for sustainable usage and take a step forward in omics-based conservation genetics.
Collapse
|
15
|
Wang Y, Li X, Wang C, Gao L, Wu Y, Ni X, Sun J, Jiang J. Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms. BMC Genomics 2021; 22:690. [PMID: 34551715 PMCID: PMC8459517 DOI: 10.1186/s12864-021-07971-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/03/2021] [Indexed: 11/10/2022] Open
Abstract
Background Miscanthus sinensis Andersson is a perennial grass that exhibits remarkable lignocellulose characteristics suitable for sustainable bioenergy production. However, knowledge of the genetic resources of this species is relatively limited, which considerably hampers further work on its biology and genetic improvement. Results In this study, through analyzing the transcriptome of mixed samples of leaves and stems using the latest PacBio Iso-Seq sequencing technology combined with Illumina HiSeq, we report the first full-length transcriptome dataset of M. sinensis with a total of 58.21 Gb clean data. An average of 15.75 Gb clean reads of each sample were obtained from the PacBio Iso-Seq system, which doubled the data size (6.68 Gb) obtained from the Illumina HiSeq platform. The integrated analyses of PacBio- and Illumina-based transcriptomic data uncovered 408,801 non-redundant transcripts with an average length of 1,685 bp. Of those, 189,406 transcripts were commonly identified by both methods, 169,149 transcripts with an average length of 619 bp were uniquely identified by Illumina HiSeq, and 51,246 transcripts with an average length of 2,535 bp were uniquely identified by PacBio Iso-Seq. Approximately 96 % of the final combined transcripts were mapped back to the Miscanthus genome, reflecting the high quality and coverage of our sequencing results. When comparing our data with genomes of four species of Andropogoneae, M. sinensis showed the closest relationship with sugarcane with up to 93 % mapping ratios, followed by sorghum with up to 80 % mapping ratios, indicating a high conservation of orthologs in these three genomes. Furthermore, 306,228 transcripts were successfully annotated against public databases including cell wall related genes and transcript factor families, thus providing many new insights into gene functions. The PacBio Iso-Seq data also helped identify 3,898 alternative splicing events and 2,963 annotated AS isoforms within 10 function categories. Conclusions Taken together, the present study provides a rich data set of full-length transcripts that greatly enriches our understanding of M. sinensis transcriptomic resources, thus facilitating further genetic improvement and molecular studies of the Miscanthus species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07971-x.
Collapse
Affiliation(s)
- Yongli Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xia Li
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Congsheng Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Lu Gao
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Yanfang Wu
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xingnan Ni
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Jianzhong Sun
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| | - Jianxiong Jiang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| |
Collapse
|
16
|
Interrogating Plant-Microbe Interactions with Chemical Tools: Click Chemistry Reagents for Metabolic Labeling and Activity-Based Probes. Molecules 2021; 26:molecules26010243. [PMID: 33466477 PMCID: PMC7796436 DOI: 10.3390/molecules26010243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/27/2020] [Accepted: 01/01/2021] [Indexed: 01/22/2023] Open
Abstract
Continued expansion of the chemical biology toolbox presents many new and diverse opportunities to interrogate the fundamental molecular mechanisms driving complex plant-microbe interactions. This review will examine metabolic labeling with click chemistry reagents and activity-based probes for investigating the impacts of plant-associated microbes on plant growth, metabolism, and immune responses. While the majority of the studies reviewed here used chemical biology approaches to examine the effects of pathogens on plants, chemical biology will also be invaluable in future efforts to investigate mutualistic associations between beneficial microbes and their plant hosts.
Collapse
|
17
|
Zhang Y, Restall J, Crisp P, Godwin I, Liu G. Current status and prospects of plant genome editing in Australia. IN VITRO CELLULAR & DEVELOPMENTAL BIOLOGY. PLANT : JOURNAL OF THE TISSUE CULTURE ASSOCIATION 2021; 57:574-583. [PMID: 34054265 PMCID: PMC8143062 DOI: 10.1007/s11627-021-10188-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 04/13/2021] [Indexed: 05/22/2023]
Abstract
Plant genome editing, particularly CRISPR-Cas biotechnologies, has rapidly evolved and drawn enormous attention all around the world in the last decade. The cutting-edge technologies have had substantial impact on precise genome editing for manipulating gene expression, stacking gene mutations, and improving crop agronomic traits. Following the global trends, investigations on CRISPR-Cas have been thriving in Australia, especially in agriculture sciences. Importantly, CRISPR-edited plants, classified as SDN-1 organisms (SDN: site-directed nuclease), have been given a green light in Australia, with regulatory bodies indicating they will not be classified as a genetically modified organism (GMO) if no foreign DNA is present in an edited plant. As a result, genome-edited products would not attract the onerous regulation required for the introduction of a GMO, which could mean more rapid deployment of new varieties and products that could be traded freely in Australia, and potentially to export markets. In the present review, we discuss the current status and prospects of plant genome editing in Australia by highlighting several species of interest. Using these species as case studies, we discuss the priorities and potential of plant genome editing, as well as the remaining challenges.
Collapse
Affiliation(s)
- Yan Zhang
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072 Australia
- School of Agriculture and Food Sciences, The University of Queensland, Brisbane, QLD 4072 Australia
| | - Jemma Restall
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072 Australia
| | - Peter Crisp
- School of Agriculture and Food Sciences, The University of Queensland, Brisbane, QLD 4072 Australia
| | - Ian Godwin
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072 Australia
| | - Guoquan Liu
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072 Australia
| |
Collapse
|
18
|
Bolger M, Schwacke R, Usadel B. MapMan Visualization of RNA-Seq Data Using Mercator4 Functional Annotations. Methods Mol Biol 2021; 2354:195-212. [PMID: 34448161 DOI: 10.1007/978-1-0716-1609-3_9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Plant omics research has advanced to the stage where it is feasible to generate data from multiple samples and multiple time points to gain insight into biological processes. This impressive array of data can prove challenging to interpret. In this chapter, we describe a solution to this problem, consisting of the MapMan transcript visualization application and the associated MapMan4 ontology and Mercator4 online annotation process.
Collapse
Affiliation(s)
- Marie Bolger
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich, Jülich, Germany.
| | - Rainer Schwacke
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich, Jülich, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
19
|
Mora-Márquez F, Chano V, Vázquez-Poletti JL, López de Heredia U. TOA: A software package for automated functional annotation in non-model plant species. Mol Ecol Resour 2020; 21:621-636. [PMID: 33070442 DOI: 10.1111/1755-0998.13285] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 10/01/2020] [Accepted: 10/13/2020] [Indexed: 01/05/2023]
Abstract
The increase of sequencing capacity provided by high-throughput platforms has made it possible to routinely obtain large sets of genomic and transcriptomic sequences from model and non-model organisms. Subsequent genomic analysis and gene discovery in next-generation sequencing experiments are, however, bottlenecked by functional annotation. One common way to perform functional annotation of sets of sequences obtained from next-generation sequencing experiments, is by searching for homologous sequences and accessing the related functional information deposited in genomic databases. Functional annotation is especially challenging for non-model organisms, like many plant species. In such cases, existing free and commercial general-purpose applications may not offer complete and accurate results. We present TOA (Taxonomy-oriented annotation), a Python-based user-friendly open source application designed to establish functional annotation pipelines geared towards non-model plant species that can run in Linux/Mac computers, HPCs and cloud servers. TOA performs homology searches against proteins stored in the PLAZA databases, NCBI RefSeq Plant, Nucleotide Database and Non-Redundant Protein Sequence Database, and outputs functional information from several ontology systems: Gene Ontology, InterPro, EC, KEGG, Mapman and MetaCyc. The software performance was validated by comparing the runtimes, total number of annotated sequences and accuracy of the functional information obtained for several plant benchmark data sets with TOA and other functional annotation solutions. TOA outperformed the other software in terms of number of annotated sequences and accuracy of the annotation and constitutes a good alternative to improve functional annotation in plants. TOA is especially recommended for gymnosperms or for low quality sequence data sets of non-model plants.
Collapse
Affiliation(s)
- Fernando Mora-Márquez
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Madrid, Spain
| | - Víctor Chano
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Madrid, Spain
| | - José Luis Vázquez-Poletti
- GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automática, Facultad de Informática, Universidad Complutense de Madrid, Madrid, Spain
| | - Unai López de Heredia
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
20
|
Niu J, Ma M, Yin X, Liu X, Dong T, Sun W, Yang F. Transcriptional and physiological analyses of reduced density in apple provide insight into the regulation involved in photosynthesis. PLoS One 2020; 15:e0239737. [PMID: 33044972 PMCID: PMC7549834 DOI: 10.1371/journal.pone.0239737] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 09/13/2020] [Indexed: 11/18/2022] Open
Abstract
Different densities have a great influence on the physiological process and growth of orchard plants. Exploring the molecular basis and revealing key candidate genes for different densities management of orchard has great significance for production capacity improvement. In this study, transcriptome sequencing of apple trees was carried out at three different sampling heights to determine gene expression patterns under high density(HD) and low density(LD) and the physiological indices were measured to determine the effect of density change on plants. As a result, physiological indexes showed that the content of Chlorophyll, ACC, RUBP and PEP in the LD was apparently higher than that in control group(high density, HD). While the content of PPO and AO in the LD was noticeably lower than that in the HD. There were 3808 differentially expressed genes (DEGs) were detected between HD and LD, of which 1935, 2390 and 1108 DEGs were found in the three comparisons(middle-upper, lower-outer and lower-inner), respectively. 274 common differentially expressed genes (co-DEGs) were contained in all three comparisons. Functional enrichment and KEGG pathway analysis found these genes were involved in Carbon fixation in photosynthetic organisms, Circadian rhythm, Photosynthesis - antenna proteins, Photosynthesis, chlorophyll metabolism, Porphyrin, sugar metabolism and so on. Among these genes, LHCB family participated in photosynthesis as parts of photosystem II. In addition, SPA1, rbcL, SNRK2, MYC2, BSK, SAUR and PP2C are involved in Circadian rhythm, the expression of genes related to glycometabolism and hormone signaling pathway is also changed. The results revealed that the decrease of plant density changed the photosynthetic efficiency of leaves and the expression of photosynthesis-related genes, which provide a theoretical basis for the actual production regulation of apples.
Collapse
Affiliation(s)
- Junqiang Niu
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
- * E-mail:
| | - Ming Ma
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| | - Xiaoning Yin
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| | - Xinglu Liu
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| | - Tie Dong
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| | - Wentai Sun
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| | - Fuxia Yang
- Institute of Fruit and Floriculture Research, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, People’s Republic of China
| |
Collapse
|
21
|
Stander EA, Williams W, Mgwatyu Y, van Heusden P, Rautenbach F, Marnewick J, Le Roes-Hill M, Hesse U. Transcriptomics of the Rooibos (Aspalathus linearis) Species Complex. BIOTECH 2020; 9:biotech9040019. [PMID: 35822822 PMCID: PMC9258316 DOI: 10.3390/biotech9040019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 07/28/2020] [Accepted: 08/04/2020] [Indexed: 12/18/2022] Open
Abstract
Rooibos (Aspalathus linearis), widely known as a herbal tea, is endemic to the Cape Floristic Region of South Africa (SA). It produces a wide range of phenolic compounds that have been associated with diverse health promoting properties of the plant. The species comprises several growth forms that differ in their morphology and biochemical composition, only one of which is cultivated and used commercially. Here, we established methodologies for non-invasive transcriptome research of wild-growing South African plant species, including (1) harvesting and transport of plant material suitable for RNA sequencing; (2) inexpensive, high-throughput biochemical sample screening; (3) extraction of high-quality RNA from recalcitrant, polysaccharide- and polyphenol rich plant material; and (4) biocomputational analysis of Illumina sequencing data, together with the evaluation of programs for transcriptome assembly (Trinity, IDBA-Trans, SOAPdenovo-Trans, CLC), protein prediction, as well as functional and taxonomic transcript annotation. In the process, we established a biochemically characterized sample pool from 44 distinct rooibos ecotypes (1–5 harvests) and generated four in-depth annotated transcriptomes (each comprising on average ≈86,000 transcripts) from rooibos plants that represent distinct growth forms and differ in their biochemical profiles. These resources will serve future rooibos research and plant breeding endeavours.
Collapse
Affiliation(s)
- Emily Amor Stander
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa; (E.A.S.); (W.W.); (Y.M.); (P.v.H.)
| | - Wesley Williams
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa; (E.A.S.); (W.W.); (Y.M.); (P.v.H.)
- Institute for Microbial Biotechnology and Metagenomics, University of the Western Cape, Bellville 7535, South Africa
| | - Yamkela Mgwatyu
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa; (E.A.S.); (W.W.); (Y.M.); (P.v.H.)
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa; (E.A.S.); (W.W.); (Y.M.); (P.v.H.)
| | - Fanie Rautenbach
- Applied Microbial and Health Biotechnology Institute, Cape Peninsula University of Technology, Bellville 7535, South Africa; (F.R.); (J.M.); (M.L.R.-H.)
| | - Jeanine Marnewick
- Applied Microbial and Health Biotechnology Institute, Cape Peninsula University of Technology, Bellville 7535, South Africa; (F.R.); (J.M.); (M.L.R.-H.)
| | - Marilize Le Roes-Hill
- Applied Microbial and Health Biotechnology Institute, Cape Peninsula University of Technology, Bellville 7535, South Africa; (F.R.); (J.M.); (M.L.R.-H.)
| | - Uljana Hesse
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa; (E.A.S.); (W.W.); (Y.M.); (P.v.H.)
- Institute for Microbial Biotechnology and Metagenomics, University of the Western Cape, Bellville 7535, South Africa
- Department of Biotechnology, University of the Western Cape, Bellville 7535, South Africa
- Correspondence:
| |
Collapse
|
22
|
Wei Q, Wang J, Wang W, Hu T, Hu H, Bao C. A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. HORTICULTURE RESEARCH 2020; 7:153. [PMID: 33024567 PMCID: PMC7506008 DOI: 10.1038/s41438-020-00391-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/19/2020] [Accepted: 08/23/2020] [Indexed: 05/04/2023]
Abstract
Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29-78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.
Collapse
Affiliation(s)
- Qingzhen Wei
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| | - Jinglei Wang
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| | - Wuhong Wang
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| | - Tianhua Hu
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| | - Haijiao Hu
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| | - Chonglai Bao
- Institute of Vegetable Research, Zhejiang Academy of Agricultural Sciences, Hangzhou, 30021 China
| |
Collapse
|
23
|
Oña Chuquimarca S, Ayala-Ruano S, Goossens J, Pauwels L, Goossens A, Leon-Reyes A, Ángel Méndez M. The Molecular Basis of JAZ-MYC Coupling, a Protein-Protein Interface Essential for Plant Response to Stressors. FRONTIERS IN PLANT SCIENCE 2020; 11:1139. [PMID: 32973821 PMCID: PMC7468482 DOI: 10.3389/fpls.2020.01139] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 07/14/2020] [Indexed: 05/29/2023]
Abstract
The jasmonic acid (JA) signaling pathway is one of the primary mechanisms that allow plants to respond to a variety of biotic and abiotic stressors. Within this pathway, the JAZ repressor proteins and the basic helix-loop-helix (bHLH) transcription factor MYC3 play a critical role. JA is a volatile organic compound with an essential role in plant immunity. The increase in the concentration of JA leads to the decoupling of the JAZ repressor proteins and the bHLH transcription factor MYC3 causing the induction of genes of interest. The primary goal of this study was to identify the molecular basis of JAZ-MYC coupling. For this purpose, we modeled and validated 12 JAZ-MYC3 3D in silico structures and developed a molecular dynamics/machine learning pipeline to obtain two outcomes. First, we calculated the average free binding energy of JAZ-MYC3 complexes, which was predicted to be -10.94 +/-2.67 kJ/mol. Second, we predicted which ones should be the interface residues that make the predominant contribution to the free energy of binding (molecular hotspots). The predicted protein hotspots matched a conserved linear motif SL••FL•••R, which may have a crucial role during MYC3 recognition of JAZ proteins. As a proof of concept, we tested, both in silico and in vitro, the importance of this motif on PEAPOD (PPD) proteins, which also belong to the TIFY protein family, like the JAZ proteins, but cannot bind to MYC3. By mutating these proteins to match the SL••FL•••R motif, we could force PPDs to bind the MYC3 transcription factor. Taken together, modeling protein-protein interactions and using machine learning will help to find essential motifs and molecular mechanisms in the JA pathway.
Collapse
Affiliation(s)
- Samara Oña Chuquimarca
- Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química, Universidad San Francisco de Quito USFQ, Campus Cumbayá, Quito, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito USFQ, Quito, Ecuador
| | - Sebastián Ayala-Ruano
- Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química, Universidad San Francisco de Quito USFQ, Campus Cumbayá, Quito, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito USFQ, Quito, Ecuador
| | - Jonas Goossens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Laurens Pauwels
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Alain Goossens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Antonio Leon-Reyes
- Laboratorio de Biotecnología Agrícola y de Alimentos, Ingeniería en Agronomía, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito, Campus Cumbayá, Quito, Ecuador
- Colegio de Ciencias Biológicas y Ambientales COCIBA, Instituto de Microbiología, Universidad San Francisco de Quito USFQ, Campus Cumbayá, Quito, Ecuador
- Colegio de Ciencias Biológicas y Ambientales COCIBA, Instituto de Investigaciones Biológicas y Ambientales BIÓSFERA, Universidad San Francisco de Quito USFQ, Campus Cumbayá, Quito, Ecuador
- Department of Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Miguel Ángel Méndez
- Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química, Universidad San Francisco de Quito USFQ, Campus Cumbayá, Quito, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito USFQ, Quito, Ecuador
| |
Collapse
|
24
|
Wong DCJ. Network aggregation improves gene function prediction of grapevine gene co-expression networks. PLANT MOLECULAR BIOLOGY 2020; 103:425-441. [PMID: 32266646 DOI: 10.1007/s11103-020-01001-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/21/2020] [Indexed: 05/08/2023]
Abstract
Aggregation across multiple networks highlights robust co-expression interactions and improves the functional connectivity of grapevine gene co-expression networks. In recent years, the rapid accumulation of transcriptome datasets from diverse experimental conditions has enabled the widespread use of gene co-expression network (GCN) analysis in plants. In grapevine, GCN analysis has shown great promise for gene function prediction, however, measurable progress is currently lacking. Using accumulated microarray datasets from the grapevine whole-genome array (33 experiments, 1359 samples), we explored how meta-analysis through aggregation influences the functional connectivity (performance) of derived networks using guilt-by-association neighbor voting. Two annotation schemes, i.e. MapMan BIN and Pfam, at two sparsity thresholds, i.e. top 100 (stringent) and 300 (relaxed) ranked genes were evaluated. We observed that aggregating across multiple networks improves performance dramatically, with the aggregate outperforming the majority of functional terms across individual networks. Network sparsity and size (i.e. the number of samples and aggregates) were key factors influencing performance while the choice of annotation scheme had little. Systematic comparison with various state-of-the-art microarray and RNA-seq networks was also performed, however, none outperformed the aggregate microarray network despite having good predictive performance. Repeating these series of tests using a functional enrichment-based performance metric also showed remarkably consistent findings with guilt-by-association neighbor voting. To demonstrate its functionality, we explore the function and transcriptional regulation of grapevine EXPANSIN genes. We envisage that network aggregation will offer new and unique opportunities for gene function prediction in future grapevine functional genomics studies. To this end, we make the aggregate networks and associated metadata publicly available at VTC-Agg (https://sites.google.com/view/vtc-agg).
Collapse
Affiliation(s)
- Darren C J Wong
- Ecology and Evolution, Research School of Biology, The Australian National University, Acton, ACT, 2601, Australia.
| |
Collapse
|
25
|
Mahood EH, Kruse LH, Moghe GD. Machine learning: A powerful tool for gene function prediction in plants. APPLICATIONS IN PLANT SCIENCES 2020; 8:e11376. [PMID: 32765975 PMCID: PMC7394712 DOI: 10.1002/aps3.11376] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 03/19/2020] [Indexed: 05/06/2023]
Abstract
Recent advances in sequencing and informatic technologies have led to a deluge of publicly available genomic data. While it is now relatively easy to sequence, assemble, and identify genic regions in diploid plant genomes, functional annotation of these genes is still a challenge. Over the past decade, there has been a steady increase in studies utilizing machine learning algorithms for various aspects of functional prediction, because these algorithms are able to integrate large amounts of heterogeneous data and detect patterns inconspicuous through rule-based approaches. The goal of this review is to introduce experimental plant biologists to machine learning, by describing how it is currently being used in gene function prediction to gain novel biological insights. In this review, we discuss specific applications of machine learning in identifying structural features in sequenced genomes, predicting interactions between different cellular components, and predicting gene function and organismal phenotypes. Finally, we also propose strategies for stimulating functional discovery using machine learning-based approaches in plants.
Collapse
Affiliation(s)
- Elizabeth H. Mahood
- Plant Biology SectionSchool of Integrative Plant SciencesCornell UniversityIthacaNew York14853USA
| | - Lars H. Kruse
- Plant Biology SectionSchool of Integrative Plant SciencesCornell UniversityIthacaNew York14853USA
| | - Gaurav D. Moghe
- Plant Biology SectionSchool of Integrative Plant SciencesCornell UniversityIthacaNew York14853USA
| |
Collapse
|
26
|
ORCAE-AOCC: A Centralized Portal for the Annotation of African Orphan Crop Genomes. Genes (Basel) 2019; 10:genes10120950. [PMID: 31757073 PMCID: PMC6969924 DOI: 10.3390/genes10120950] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/15/2019] [Accepted: 11/18/2019] [Indexed: 12/19/2022] Open
Abstract
ORCAE (Online Resource for Community Annotation of Eukaryotes) is a public genome annotation curation resource. ORCAE-AOCC is a branch that is dedicated to the genomes published as part of the African Orphan Crops Consortium (AOCC). The motivation behind the development of the ORCAE platform was to create a knowledge-based website where the research-community can make contributions to improve genome annotations. All changes to any given gene-model or gene description are stored, and the entire annotation history can be retrieved. Genomes can either be set to “public” or “restricted” mode; anonymous users can browse public genomes but cannot make any changes. Aside from providing a user- friendly interface to view genome annotations, the platform also includes tools and information (such as gene expression evidence) that enables authorized users to edit and validate genome annotations. The ORCAE-AOCC platform will enable various stakeholders from around the world to coordinate their efforts to annotate and study underutilized crops.
Collapse
|
27
|
Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris. G3-GENES GENOMES GENETICS 2019; 9:3409-3421. [PMID: 31427456 PMCID: PMC6778806 DOI: 10.1534/g3.119.400357] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.
Collapse
|
28
|
Baum S, Reimer-Michalski EM, Bolger A, Mantai AJ, Benes V, Usadel B, Conrath U. Isolation of Open Chromatin Identifies Regulators of Systemic Acquired Resistance. PLANT PHYSIOLOGY 2019; 181:817-833. [PMID: 31337712 PMCID: PMC6776868 DOI: 10.1104/pp.19.00673] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 07/17/2019] [Indexed: 05/11/2023]
Abstract
Upon local infection, plants activate a systemic immune response called systemic acquired resistance (SAR). During SAR, systemic leaves become primed for the superinduction of defense genes upon reinfection. We used formaldehyde-assisted isolation of regulatory DNA elements coupled to next-generation sequencing to identify SAR regulators. Our bioinformatic analysis produced 10,129 priming-associated open chromatin sites in the 5' region of 3,025 genes in the systemic leaves of Arabidopsis (Arabidopsis thaliana) plants locally infected with Pseudomonas syringae pv. maculicola Whole transcriptome shotgun sequencing analysis of the systemic leaves after challenge enabled the identification of genes with priming-linked open chromatin before (contained in the formaldehyde-assisted isolation of regulatory DNA elements sequencing dataset) and enhanced expression after (included in the whole transcriptome shotgun sequencing dataset) the systemic challenge. Among them, Arabidopsis MILDEW RESISTANCE LOCUS O3 (MLO3) was identified as a previously unidentified positive regulator of SAR. Further in silico analysis disclosed two yet unknown cis-regulatory DNA elements in the 5' region of genes. The P-box was mainly associated with priming-responsive genes, whereas the C-box was mostly linked to challenge. We found that the P- or W-box, the latter recruiting WRKY transcription factors, or combinations of these boxes, characterize the 5' region of most primed genes. Therefore, this study provides a genome-wide record of genes with open and accessible chromatin during SAR and identifies MLO3 and two previously unidentified DNA boxes as likely regulators of this immune response.
Collapse
Affiliation(s)
- Stephani Baum
- Department of Plant Physiology, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| | - Eva-Maria Reimer-Michalski
- Department of Plant Physiology, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| | - Anthony Bolger
- Department of Botany, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| | - Andrea J Mantai
- Department of Plant Physiology, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| | - Vladimir Benes
- Genomics Core Facility, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Björn Usadel
- Department of Botany, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| | - Uwe Conrath
- Department of Plant Physiology, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen 52056, Germany
| |
Collapse
|
29
|
Park JS, Park JH, Park YD. Construction of pseudomolecule sequences of Brassica rapa ssp. pekinensis inbred line CT001 and analysis of spontaneous mutations derived via sexual propagation. PLoS One 2019; 14:e0222283. [PMID: 31498838 PMCID: PMC6733507 DOI: 10.1371/journal.pone.0222283] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023] Open
Abstract
Chinese cabbage (Brassica rapa ssp. pekinensis) is a major crop that is widely cultivated, especially in Korea, Japan, and China. With the advent of next generation sequencing technology, the cost and time required for sequencing have decreased and the development of genome research accelerated. Genome sequencing of Chinese cabbage was completed in 2011 using the variety Chiifu-401-42, and since then the genome has been continuously updated. In the present study, we conducted whole-genome sequencing of Chinese cabbage inbred line CT001, a line widely used in traditional or molecular breeding, to improve the accuracy of genetic polymorphism analysis. The constructed CT001 pseudomolecule represented 85.4% (219.8 Mb) of the Chiifu reference genome, and a total of 38,567 gene models were annotated using RNA-Seq analysis. In addition, the spontaneous mutation rate of CT001 was estimated by resequencing DNA obtained from individual plants after sexual propagation for six generations to estimate the naturally occurring variations. The CT001 pseudomolecule constructed in this study will provide valuable resources for genomic studies on Chinese cabbage.
Collapse
Affiliation(s)
- Jee-Soo Park
- Department of Horticultural Biotechnology, Kyung Hee University, Yongin, Korea
| | - Ji-Hyun Park
- Department of Horticultural Biotechnology, Kyung Hee University, Yongin, Korea
| | - Young-Doo Park
- Department of Horticultural Biotechnology, Kyung Hee University, Yongin, Korea
- * E-mail:
| |
Collapse
|
30
|
Molecular Traits of Long Non-protein Coding RNAs from Diverse Plant Species Show Little Evidence of Phylogenetic Relationships. G3-GENES GENOMES GENETICS 2019; 9:2511-2520. [PMID: 31235560 PMCID: PMC6686929 DOI: 10.1534/g3.119.400201] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Long non-coding RNAs (lncRNAs) represent a diverse class of regulatory loci with roles in development and stress responses throughout all kingdoms of life. LncRNAs, however, remain under-studied in plants compared to animal systems. To address this deficiency, we applied a machine learning prediction tool, Classifying RNA by Ensemble Machine learning Algorithm (CREMA), to analyze RNAseq data from 11 plant species chosen to represent a wide range of evolutionary histories. Transcript sequences of all expressed and/or annotated loci from plants grown in unstressed (control) conditions were assembled and input into CREMA for comparative analyses. On average, 6.4% of the plant transcripts were identified by CREMA as encoding lncRNAs. Gene annotation associated with the transcripts showed that up to 99% of all predicted lncRNAs for Solanum tuberosum and Amborella trichopoda were missing from their reference annotations whereas the reference annotation for the genetic model plant Arabidopsis thaliana contains 96% of all predicted lncRNAs for this species. Thus a reliance on reference annotations for use in lncRNA research in less well-studied plants can be impeded by the near absence of annotations associated with these regulatory transcripts. Moreover, our work using phylogenetic signal analyses suggests that molecular traits of plant lncRNAs display different evolutionary patterns than all other transcripts in plants and have molecular traits that do not follow a classic evolutionary pattern. Specifically, GC content was the only tested trait of lncRNAs with consistently significant and high phylogenetic signal, contrary to high signal in all tested molecular traits for the other transcripts in our tested plant species.
Collapse
|
31
|
Klepikova AV, Kulakovskiy IV, Kasianov AS, Logacheva MD, Penin AA. An update to database TraVA: organ-specific cold stress response in Arabidopsis thaliana. BMC PLANT BIOLOGY 2019; 19:49. [PMID: 30813912 PMCID: PMC6393959 DOI: 10.1186/s12870-019-1636-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
BACKGROUND Transcriptome map is a powerful tool for a variety of biological studies; transcriptome maps that include different organs, tissues, cells and stages of development are currently available for at least 30 plants. Some of them include samples treated by environmental or biotic stresses. However, most studies explore only limited set of organs and developmental stages (leaves or seedlings). In order to provide broader view of organ-specific strategies of cold stress response we studied expression changes that follow exposure to cold (+ 4 °C) in different aerial parts of plant: cotyledons, hypocotyl, leaves, young flowers, mature flowers and seeds using RNA-seq. RESULTS The results on differential expression in leaves are congruent with current knowledge on stress response pathways, in particular, the role of CBF genes. In other organs, both essence and dynamics of gene expression changes are different. We show the involvement of genes that are confined to narrow expression patterns in non-stress conditions into stress response. In particular, the genes that control cell wall modification in pollen, are activated in leaves. In seeds, predominant pattern is the change of lipid metabolism. CONCLUSIONS Stress response is highly organ-specific; different pathways are involved in this process in each type of organs. The results were integrated with previously published transcriptome map of Arabidopsis thaliana and used for an update of a public database TraVa: http://travadb.org/browse/Species=AthStress .
Collapse
Affiliation(s)
- Anna V. Klepikova
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Bolshoy Karetny per. 19, build.1, Moscow, 127051 Russia
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, Moscow, 119991 Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Vitkevicha 1, Pushchino, Moscow Region, 142290 Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, 119991 Moscow, Russia
| | - Artem S. Kasianov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, Moscow, 119991 Russia
| | - Maria D. Logacheva
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Bolshoy Karetny per. 19, build.1, Moscow, 127051 Russia
- Moscow State University, Leninskye gory, build 1, Moscow, 119992 Russia
- Skolkovo Institute of Science and Technology, Nobelya Ulitsa 3, Moscow, 121205 Russia
| | - Aleksey A. Penin
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Bolshoy Karetny per. 19, build.1, Moscow, 127051 Russia
- Moscow State University, Leninskye gory, build 1, Moscow, 119992 Russia
| |
Collapse
|
32
|
Bolger AM, Poorter H, Dumschott K, Bolger ME, Arend D, Osorio S, Gundlach H, Mayer KFX, Lange M, Scholz U, Usadel B. Computational aspects underlying genome to phenome analysis in plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:182-198. [PMID: 30500991 PMCID: PMC6849790 DOI: 10.1111/tpj.14179] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 11/06/2018] [Accepted: 11/16/2018] [Indexed: 05/18/2023]
Abstract
Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high-throughput plant phenotyping is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait-trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features.
Collapse
Affiliation(s)
- Anthony M. Bolger
- Institute for Biology I, BioSCRWTH Aachen UniversityWorringer Weg 352074AachenGermany
| | - Hendrik Poorter
- Forschungszentrum Jülich (FZJ) Institute of Bio‐ and Geosciences (IBG‐2) Plant SciencesWilhelm‐Johnen‐Straße52428JülichGermany
- Department of Biological SciencesMacquarie UniversityNorth RydeNSW2109Australia
| | - Kathryn Dumschott
- Institute for Biology I, BioSCRWTH Aachen UniversityWorringer Weg 352074AachenGermany
| | - Marie E. Bolger
- Forschungszentrum Jülich (FZJ) Institute of Bio‐ and Geosciences (IBG‐2) Plant SciencesWilhelm‐Johnen‐Straße52428JülichGermany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenCorrensstraße 306466SeelandGermany
| | - Sonia Osorio
- Department of Molecular Biology and BiochemistryInstituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”Universidad de Málaga‐Consejo Superior de Investigaciones CientíficasCampus de Teatinos29071MálagaSpain
| | - Heidrun Gundlach
- Plant Genome and Systems Biology (PGSB)Helmholtz Zentrum München (HMGU)Ingolstädter Landstraße 185764NeuherbergGermany
| | - Klaus F. X. Mayer
- Plant Genome and Systems Biology (PGSB)Helmholtz Zentrum München (HMGU)Ingolstädter Landstraße 185764NeuherbergGermany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenCorrensstraße 306466SeelandGermany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenCorrensstraße 306466SeelandGermany
| | - Björn Usadel
- Institute for Biology I, BioSCRWTH Aachen UniversityWorringer Weg 352074AachenGermany
- Forschungszentrum Jülich (FZJ) Institute of Bio‐ and Geosciences (IBG‐2) Plant SciencesWilhelm‐Johnen‐Straße52428JülichGermany
| |
Collapse
|
33
|
Arsova B, Watt M, Usadel B. Monitoring of Plant Protein Post-translational Modifications Using Targeted Proteomics. FRONTIERS IN PLANT SCIENCE 2018; 9:1168. [PMID: 30174677 PMCID: PMC6107839 DOI: 10.3389/fpls.2018.01168] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/23/2018] [Indexed: 05/19/2023]
Abstract
Protein post-translational modifications (PTMs) are among the fastest and earliest of plant responses to changes in the environment, making the mechanisms and dynamics of PTMs an important area of plant science. One of the most studied PTMs is protein phosphorylation. This review summarizes the use of targeted proteomics for the elucidation of the biological functioning of plant PTMs, and focuses primarily on phosphorylation. Since phosphorylated peptides have a low abundance, usually complex enrichment protocols are required for their research. Initial identification is usually performed with discovery phosphoproteomics, using high sensitivity mass spectrometers, where as many phosphopeptides are measured as possible. Once a PTM site is identified, biological characterization can be addressed with targeted proteomics. In targeted proteomics, Selected/Multiple Reaction Monitoring (S/MRM) is traditionally coupled to simple, standard protein digestion protocols, often omitting the enrichment step, and relying on triple-quadruple mass spectrometer. The use of synthetic peptides as internal standards allows accurate identification, avoiding cross-reactivity typical for some antibody based approaches. Importantly, internal standards allow absolute peptide quantitation, reported down to 0.1 femtomoles, also useful for determination of phospho-site occupancy. S/MRM is advantageous in situations where monitoring and diagnostics of peptide PTM status is needed for many samples, as it has faster sample processing times, higher throughput than other approaches, and excellent quantitation and reproducibility. Furthermore, the number of publicly available data-bases with plant PTM discovery data is growing, facilitating selection of modified peptides and design of targeted proteomics workflows. Recent instrument developments result in faster scanning times, inclusion of ion-trap instruments leading to parallel reaction monitoring- which further facilitates S/MRM experimental design. Finally, recent combination of data independent and data dependent spectra acquisition means that in addition to anticipated targeted data, spectra can now be queried for unanticipated information. The potential for future applications in plant biology is outlined.
Collapse
Affiliation(s)
- Borjana Arsova
- Institut für Bio- und Geowissenschaften, IBG-2–Plant Sciences, Forschungszentrum Jülich, Jülich, Germany
| | - Michelle Watt
- Institut für Bio- und Geowissenschaften, IBG-2–Plant Sciences, Forschungszentrum Jülich, Jülich, Germany
| | - Björn Usadel
- Institut für Bio- und Geowissenschaften, IBG-2–Plant Sciences, Forschungszentrum Jülich, Jülich, Germany
- IBMG: Institute for Biology I, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
34
|
Delventhal R, Rajaraman J, Stefanato FL, Rehman S, Aghnoum R, McGrann GRD, Bolger M, Usadel B, Hedley PE, Boyd L, Niks RE, Schweizer P, Schaffrath U. A comparative analysis of nonhost resistance across the two Triticeae crop species wheat and barley. BMC PLANT BIOLOGY 2017; 17:232. [PMID: 29202692 PMCID: PMC5715502 DOI: 10.1186/s12870-017-1178-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 11/15/2017] [Indexed: 05/23/2023]
Abstract
BACKGROUND Nonhost resistance (NHR) protects plants against a vast number of non-adapted pathogens which implicates a potential exploitation as source for novel disease resistance strategies. Aiming at a fundamental understanding of NHR a global analysis of transcriptome reprogramming in the economically important Triticeae cereals wheat and barley, comparing host and nonhost interactions in three major fungal pathosystems responsible for powdery mildew (Blumeria graminis ff. ssp.), cereal blast (Magnaporthe sp.) and leaf rust (Puccinia sp.) diseases, was performed. RESULTS In each pathosystem a significant transcriptome reprogramming by adapted- or non-adapted pathogen isolates was observed, with considerable overlap between Blumeria, Magnaporthe and Puccinia. Small subsets of these general pathogen-regulated genes were identified as differentially regulated between host and corresponding nonhost interactions, indicating a fine-tuning of the general pathogen response during the course of co-evolution. Additionally, the host- or nonhost-related responses were rather specific for each pair of adapted and non-adapted isolates, indicating that the nonhost resistance-related responses were to a great extent pathosystem-specific. This pathosystem-specific reprogramming may reflect different resistance mechanisms operating against non-adapted pathogens with different lifestyles, or equally, different co-option of the hosts by the adapted isolates to create an optimal environment for infection. To compare the transcriptional reprogramming between wheat and barley, putative orthologues were identified. Within the wheat and barley general pathogen-regulated genes, temporal expression profiles of orthologues looked similar, indicating conserved general responses in Triticeae against fungal attack. However, the comparison of orthologues differentially expressed between host and nonhost interactions revealed fewer commonalities between wheat and barley, but rather suggested different host or nonhost responses in the two cereal species. CONCLUSIONS Taken together, our results suggest independent co-evolutionary forces acting on host pathosystems mirrored by barley- or wheat-specific nonhost responses. As a result of evolutionary processes, at least for the pathosystems investigated, NHR appears to rely on rather specific plant responses.
Collapse
Affiliation(s)
- Rhoda Delventhal
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| | - Jeyaraman Rajaraman
- Leibniz-Institute of Plant Genetics and Crop Plant Research, 06466 Gatersleben, Germany
| | - Francesca L. Stefanato
- Department of Disease and Stress Biology, John Innes Centre, Norwich Research Park, Colney Lane, Colney, Norwich, Norfolk, NR4 7UH UK
- Present address: Molecular microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - Sajid Rehman
- Plant Breeding, Graduate School for Experimental Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
- Present address: Biodiversity and Integrated Gene Management Program (BIGM), International Center for Agriculture Research in the Dry Areas, Rabat, Morocco
| | - Reza Aghnoum
- Plant Breeding, Graduate School for Experimental Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
- Present address: Seed and Plant Improvement Research Department, Khorasan Razavi Agricultural and Natural Resources Research and Education Center, AREEO, Mashhad, Iran
| | - Graham R. D. McGrann
- Department of Disease and Stress Biology, John Innes Centre, Norwich Research Park, Colney Lane, Colney, Norwich, Norfolk, NR4 7UH UK
| | - Marie Bolger
- Institute of Botany and Molecular Genetics, BioSC, RWTH Aachen University, 52056 Aachen, Germany
| | - Björn Usadel
- Institute of Botany and Molecular Genetics, BioSC, RWTH Aachen University, 52056 Aachen, Germany
| | - Pete E. Hedley
- The James Hutton Institute, Invergowrie, Dundee, Scotland DD2 5DA UK
| | - Lesley Boyd
- NIAB, Huntingdon Road, Cambridge, CB3 0LE UK
| | - Rients E. Niks
- Plant Breeding, Graduate School for Experimental Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Patrick Schweizer
- Leibniz-Institute of Plant Genetics and Crop Plant Research, 06466 Gatersleben, Germany
| | - Ulrich Schaffrath
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany
| |
Collapse
|
35
|
Hofmann NR. Nanopore Sequencing Comes to Plant Genomes. THE PLANT CELL 2017; 29:2677-2678. [PMID: 29114013 PMCID: PMC5728122 DOI: 10.1105/tpc.17.00863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
|
36
|
Schmidt MHW, Vogel A, Denton AK, Istace B, Wormit A, van de Geest H, Bolger ME, Alseekh S, Maß J, Pfaff C, Schurr U, Chetelat R, Maumus F, Aury JM, Koren S, Fernie AR, Zamir D, Bolger AM, Usadel B. De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing. THE PLANT CELL 2017; 29:2336-2348. [PMID: 29025960 PMCID: PMC5774570 DOI: 10.1105/tpc.17.00521] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/15/2017] [Accepted: 10/11/2017] [Indexed: 05/19/2023]
Abstract
Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.
Collapse
Affiliation(s)
- Maximilian H-W Schmidt
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
| | - Alexander Vogel
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
| | - Alisandra K Denton
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
| | - Benjamin Istace
- Commissariat à l'Energie Atomique et aux Energies Alternatives, Genoscope, 91057 Evry, France
| | - Alexandra Wormit
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
| | | | - Marie E Bolger
- Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Saleh Alseekh
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | - Janina Maß
- Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Christian Pfaff
- Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Ulrich Schurr
- Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Roger Chetelat
- C.M. Rick Tomato Genetics Resource Center, Department of Plant Sciences, University of California, Davis, California 95616
| | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - Jean-Marc Aury
- Commissariat à l'Energie Atomique et aux Energies Alternatives, Genoscope, 91057 Evry, France
| | - Sergey Koren
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892
| | - Alisdair R Fernie
- Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | - Dani Zamir
- The Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture, The Hebrew University of Jerusalem, Rehovot 76100, Israel
| | - Anthony M Bolger
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
| | - Björn Usadel
- Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany
- Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, 52428 Jülich, Germany
| |
Collapse
|
37
|
Wong DCJ, Amarasinghe R, Rodriguez-Delgado C, Eyles R, Pichersky E, Peakall R. Tissue-Specific Floral Transcriptome Analysis of the Sexually Deceptive Orchid Chiloglottis trapeziformis Provides Insights into the Biosynthesis and Regulation of Its Unique UV-B Dependent Floral Volatile, Chiloglottone 1. FRONTIERS IN PLANT SCIENCE 2017; 8:1260. [PMID: 28769963 PMCID: PMC5515871 DOI: 10.3389/fpls.2017.01260] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/04/2017] [Indexed: 05/29/2023]
Abstract
The Australian sexually deceptive orchid, Chiloglottis trapeziformis, employs a unique UV-B-dependent floral volatile, chiloglottone 1, for specific male wasp pollinator attraction. Chiloglottone 1 and related variants (2,5-dialkylcyclohexane-1,3-diones), represent a unique class of specialized metabolites presumed to be the product of cyclization between two fatty acid (FA) precursors. However, the genes involved in the biosynthesis of precursors, intermediates, and transcriptional regulation remains to be discovered. Chiloglottone 1 production occurs in the aggregation of calli (callus) on the labellum under continuous UV-B light. Therefore, deep sequencing, transcriptome assembly, and differential expression (DE) analysis were performed across different tissue types and UV-B treatments. Transcripts expressed in the callus and labellum (∼23,000 transcripts) were highly specialized and enriched for a diversity of known and novel metabolic pathways. DE analysis between chiloglottone-emitting callus versus the remainder of the labellum showed strong coordinated induction of entire FA biosynthesis and β-oxidation pathways including genes encoding Ketoacyl-ACP Synthase, Acyl-CoA Oxidase, and Multifunctional Protein. Phylogenetic analysis revealed potential gene duplicates with tissue-specific differential regulation including two Acyl-ACP Thioesterase B and a Ketoacyl-ACP Synthase genes. UV-B treatment induced the activation of UVR8-mediated signaling and large-scale transcriptome changes in both tissues, however, neither FA biosynthesis/β-oxidation nor other lipid metabolic pathways showed clear indications of concerted DE. Gene co-expression network analysis identified three callus-specific modules enriched with various lipid metabolism categories. These networks also highlight promising candidates involved in the cyclization of chiloglottone 1 intermediates (e.g., Bet v I and dimeric α,β barrel proteins) and orchestrating regulation of precursor pathways (e.g., AP2/ERF) given a strong co-regulation with FA biosynthesis/β-oxidation genes. Possible alternative biosynthetic routes for precursors (e.g., aldehyde dehydrogenases) were also indicated. Our comprehensive study constitutes the first step toward understanding the biosynthetic pathways involved in chiloglottone 1 production in Chiloglottis trapeziformis - supporting the roles of FA metabolism in planta, gene duplication as a potential source of new genes, and co-regulation of novel pathway genes in a tissue-specific manner. This study also provides a new and valuable resource for future discovery and comparative studies in plant specialized metabolism of other orchids and non-model plants.
Collapse
Affiliation(s)
- Darren C. J. Wong
- Ecology and Evolution, Research School of Biology, The Australian National University, CanberraACT, Australia
| | - Ranamalie Amarasinghe
- Ecology and Evolution, Research School of Biology, The Australian National University, CanberraACT, Australia
| | - Claudia Rodriguez-Delgado
- Ecology and Evolution, Research School of Biology, The Australian National University, CanberraACT, Australia
| | - Rodney Eyles
- Ecology and Evolution, Research School of Biology, The Australian National University, CanberraACT, Australia
| | - Eran Pichersky
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann ArborMI, United States
| | - Rod Peakall
- Ecology and Evolution, Research School of Biology, The Australian National University, CanberraACT, Australia
| |
Collapse
|
38
|
Karakülah G. Discovery and Annotation of Plant Endogenous Target Mimicry Sequences from Public Transcriptome Libraries: A Case Study of Prunus persica. J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2017-0009/jib-2017-0009.xml. [PMID: 28672765 PMCID: PMC6042811 DOI: 10.1515/jib-2017-0009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 04/12/2017] [Indexed: 01/28/2023] Open
Abstract
Novel transcript discovery through RNA sequencing has substantially improved our understanding of the transcriptome dynamics of biological systems. Endogenous target mimicry (eTM) transcripts, a novel class of regulatory molecules, bind to their target microRNAs (miRNAs) by base pairing and block their biological activity. The objective of this study was to provide a computational analysis framework for the prediction of putative eTM sequences in plants, and as an example, to discover previously un-annotated eTMs in Prunus persica (peach) transcriptome. Therefore, two public peach transcriptome libraries downloaded from Sequence Read Archive (SRA) and a previously published set of long non-coding RNAs (lncRNAs) were investigated with multi-step analysis pipeline, and 44 putative eTMs were found. Additionally, an eTM-miRNA-mRNA regulatory network module associated with peach fruit organ development was built via integration of the miRNA target information and predicted eTM-miRNA interactions. My findings suggest that one of the most widely expressed miRNA families among diverse plant species, miR156, might be potentially sponged by seven putative eTMs. Besides, the study indicates eTMs potentially play roles in the regulation of development processes in peach fruit via targeting specific miRNAs. In conclusion, by following the step-by step instructions provided in this study, novel eTMs can be identified and annotated effectively in public plant transcriptome libraries.
Collapse
|
39
|
From plant genomes to phenotypes. J Biotechnol 2017; 261:46-52. [PMID: 28602791 DOI: 10.1016/j.jbiotec.2017.06.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 05/27/2017] [Accepted: 06/07/2017] [Indexed: 12/21/2022]
Abstract
Recent advances in sequencing technologies have greatly accelerated the rate of plant genome and applied breeding research. Despite this advancing trend, plant genomes continue to present numerous difficulties to the standard tools and pipelines not only for genome assembly but also gene annotation and downstream analysis. Here we give a perspective on tools, resources and services necessary to assemble and analyze plant genomes and link them to plant phenotypes.
Collapse
|