1
|
Zhao K, Farrell K, Mashiku M, Abay D, Tang K, Oberste MS, Burns CC. A search-based geographic metadata curation pipeline to refine sequencing institution information and support public health. Front Public Health 2023; 11:1254976. [PMID: 38035280 PMCID: PMC10683794 DOI: 10.3389/fpubh.2023.1254976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/19/2023] [Indexed: 12/02/2023] Open
Abstract
Background The National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) has amassed a vast reservoir of genetic data since its inception in 2007. These public data hold immense potential for supporting pathogen surveillance and control. However, the lack of standardized metadata and inconsistent submission practices in SRA may impede the data's utility in public health. Methods To address this issue, we introduce the Search-based Geographic Metadata Curation (SGMC) pipeline. SGMC utilized Python and web scraping to extract geographic data of sequencing institutions from NCBI SRA in the Cloud and its website. It then harnessed ChatGPT to refine the sequencing institution and location assignments. To illustrate the pipeline's utility, we examined the geographic distribution of the sequencing institutions and their countries relevant to polio eradication and categorized them. Results SGMC successfully identified 7,649 sequencing institutions and their global locations from a random selection of 2,321,044 SRA accessions. These institutions were distributed across 97 countries, with strong representation in the United States, the United Kingdom and China. However, there was a lack of data from African, Central Asian, and Central American countries, indicating potential disparities in sequencing capabilities. Comparison with manually curated data for U.S. institutions reveals SGMC's accuracy rates of 94.8% for institutions, 93.1% for countries, and 74.5% for geographic coordinates. Conclusion SGMC may represent a novel approach using a generative AI model to enhance geographic data (country and institution assignments) for large numbers of samples within SRA datasets. This information can be utilized to bolster public health endeavors.
Collapse
Affiliation(s)
- Kun Zhao
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Katie Farrell
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Melchizedek Mashiku
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Dawit Abay
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Kevin Tang
- Division of Scientific Resources, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - M Steven Oberste
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Cara C Burns
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
2
|
Liu L, Heidecker M, Depuydt T, Manosalva Perez N, Crespi M, Blein T, Vandepoele K. Transcription factors KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4 regulate long intergenic noncoding RNAs expressed in Arabidopsis roots. PLANT PHYSIOLOGY 2023; 193:1933-1953. [PMID: 37345955 DOI: 10.1093/plphys/kiad360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/02/2023] [Accepted: 06/05/2023] [Indexed: 06/23/2023]
Abstract
Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in plant genomes. While some lincRNAs have been characterized as important regulators in different biological processes, little is known about the transcriptional regulation for most plant lincRNAs. Through the integration of 8 annotation resources, we defined 6,599 high-confidence lincRNA loci in Arabidopsis (Arabidopsis thaliana). For lincRNAs belonging to different evolutionary age categories, we identified major differences in sequence and chromatin features, as well as in the level of conservation and purifying selection acting during evolution. Spatiotemporal gene expression profiles combined with transcription factor (TF) chromatin immunoprecipitation (ChIP) data were used to construct a TF-lincRNA regulatory network containing 2,659 lincRNAs and 15,686 interactions. We found that properties characterizing lincRNA expression, conservation, and regulation differ between plants and animals. Experimental validation confirmed the role of 3 TFs, KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4, as key regulators controlling root-specific lincRNA expression, demonstrating the predictive power of our network. Furthermore, we identified 58 lincRNAs, regulated by these TFs, showing strong root cell type-specific expression or chromatin accessibility, which are linked with genome-wide association studies genetic associations related to root system development and growth. The multilevel genome-wide characterization covering chromatin state information, promoter conservation, and chromatin immunoprecipitation-based TF binding, for all detectable lincRNAs across 769 expression samples, permits rapidly defining the biological context and relevance of Arabidopsis lincRNAs through regulatory networks.
Collapse
Affiliation(s)
- Li Liu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Michel Heidecker
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Nicolas Manosalva Perez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Martin Crespi
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Thomas Blein
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
| |
Collapse
|
3
|
Julca I, Tan QW, Mutwil M. Toward kingdom-wide analyses of gene expression. TRENDS IN PLANT SCIENCE 2023; 28:235-249. [PMID: 36344371 DOI: 10.1016/j.tplants.2022.09.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 09/22/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
Gene expression data for Archaeplastida are accumulating exponentially, with more than 300 000 RNA-sequencing (RNA-seq) experiments available for hundreds of species. The gene expression data stem from thousands of experiments that capture gene expression in various organs, tissues, cell types, (a)biotic perturbations, and genotypes. Advances in software tools make it possible to process all these data in a matter of weeks on modern office computers, giving us the possibility to study gene expression in a kingdom-wide manner for the first time. We discuss how the expression data can be accessed and processed and outline analyses that take advantage of cross-species analyses, allowing us to generate powerful and robust hypotheses about gene function and evolution.
Collapse
Affiliation(s)
- Irene Julca
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Qiao Wen Tan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore.
| |
Collapse
|
4
|
Tjaden B. Escherichia coli transcriptome assembly from a compendium of RNA-seq data sets. RNA Biol 2023; 20:77-84. [PMID: 36920168 PMCID: PMC10392735 DOI: 10.1080/15476286.2023.2189331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 01/09/2023] [Accepted: 01/27/2023] [Indexed: 03/16/2023] Open
Abstract
Owing to the complexities of bacterial RNA biology, the transcriptomes of even the best studied bacteria are not fully understood. To help elucidate the transcriptional landscape of E. coli, we compiled a compendium of 3,376 RNA-seq data sets composed of more than 7 trillion sequenced bases, which we evaluate with a transcript assembly pipeline. We report expression profiles for all annotated E. coli genes as well as 5,071 other transcripts. Additionally, we observe hundreds of instances of co-transcribed genes that are novel with respect to existing operon databases. By integrating data from a large number of sequencing experiments corresponding to a wide range of conditions, we are able to obtain a comprehensive view of the E. coli transcriptome.
Collapse
Affiliation(s)
- Brian Tjaden
- Department of Computer Science, Wellesley College, Wellesley, MA, USA
| |
Collapse
|
5
|
Ferrari C, Manosalva Pérez N, Vandepoele K. MINI-EX: Integrative inference of single-cell gene regulatory networks in plants. MOLECULAR PLANT 2022; 15:1807-1824. [PMID: 36307979 DOI: 10.1016/j.molp.2022.10.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/30/2022] [Accepted: 10/21/2022] [Indexed: 05/26/2023]
Abstract
Multicellular organisms, such as plants, are characterized by highly specialized and tightly regulated cell populations, establishing specific morphological structures and executing distinct functions. Gene regulatory networks (GRNs) describe condition-specific interactions of transcription factors (TFs) regulating the expression of target genes, underpinning these specific functions. As efficient and validated methods to identify cell-type-specific GRNs from single-cell data in plants are lacking, limiting our understanding of the organization of specific cell types in both model species and crops, we developed MINI-EX (Motif-Informed Network Inference based on single-cell EXpression data), an integrative approach to infer cell-type-specific networks in plants. MINI-EX uses single-cell transcriptomic data to define expression-based networks and integrates TF motif information to filter the inferred regulons, resulting in networks with increased accuracy. Next, regulons are assigned to different cell types, leveraging cell-specific expression, and candidate regulators are prioritized using network centrality measures, functional annotations, and expression specificity. This embedded prioritization strategy offers a unique and efficient means to unravel signaling cascades in specific cell types controlling a biological process of interest. We demonstrate the stability of MINI-EX toward input data sets with low number of cells and its robustness toward missing data, and show that it infers state-of-the-art networks with a better performance compared with other related single-cell network tools. MINI-EX successfully identifies key regulators controlling root development in Arabidopsis and rice, leaf development in Arabidopsis, and ear development in maize, enhancing our understanding of cell-type-specific regulation and unraveling the roles of different regulators controlling the development of specific cell types in plants.
Collapse
Affiliation(s)
- Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.
| |
Collapse
|
6
|
Depuydt T, Vandepoele K. Multi-omics network-based functional annotation of unknown Arabidopsis genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 108:1193-1212. [PMID: 34562334 DOI: 10.1111/tpj.15507] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/20/2021] [Indexed: 06/13/2023]
Abstract
Unraveling gene function is pivotal to understanding the signaling cascades that control plant development and stress responses. As experimental profiling is costly and labor intensive, there is a clear need for high-confidence computational annotation. In contrast to detailed gene-specific functional information, transcriptomics data are widely available for both model and crop species. Here, we describe a novel automated function prediction method, which leverages complementary information from multiple expression datasets by analyzing study-specific gene co-expression networks. First, we benchmarked the prediction performance on recently characterized Arabidopsis thaliana genes, and showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n = 15 790) and unknown (n = 11 865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 000 interactions in total), obtaining a set of high-confidence functional annotations. Our method assigned at least one validated annotation to 5054 (42.6%) unknown genes, and at least one novel validated function to 3408 (53.0%) genes with computational annotations only. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help fill the information gap on biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our automated function prediction approach can be applied in future studies to facilitate gene discovery for crop improvement.
Collapse
Affiliation(s)
- Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
7
|
Bilcke G, Osuna-Cruz CM, Santana Silva M, Poulsen N, D'hondt S, Bulankova P, Vyverman W, De Veylder L, Vandepoele K. Diurnal transcript profiling of the diatom Seminavis robusta reveals adaptations to a benthic lifestyle. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:315-336. [PMID: 33901335 DOI: 10.1111/tpj.15291] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/16/2021] [Accepted: 04/19/2021] [Indexed: 06/12/2023]
Abstract
Coastal regions contribute an estimated 20% of annual gross primary production in the oceans, despite occupying only 0.03% of their surface area. Diatoms frequently dominate coastal sediments, where they experience large variations in light regime resulting from the interplay of diurnal and tidal cycles. Here, we report on an extensive diurnal transcript profiling experiment of the motile benthic diatom Seminavis robusta. Nearly 90% (23 328) of expressed protein-coding genes and 66.9% (1124) of expressed long intergenic non-coding RNAs showed significant expression oscillations and are predominantly phasing at night with a periodicity of 24 h. Phylostratigraphic analysis found that rhythmic genes are enriched in highly conserved genes, while diatom-specific genes are predominantly associated with midnight expression. Integration of genetic and physiological cell cycle markers with silica depletion data revealed potential new silica cell wall-associated gene families specific to diatoms. Additionally, we observed 1752 genes with a remarkable semidiurnal (12-h) periodicity, while the expansion of putative circadian transcription factors may reflect adaptations to cope with highly unpredictable external conditions. Taken together, our results provide new insights into the adaptations of diatoms to the benthic environment and serve as a valuable resource for the study of diurnal regulation in photosynthetic eukaryotes.
Collapse
Affiliation(s)
- Gust Bilcke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
- Department of Biology, Protistology and Aquatic Ecology, Ghent University, Ghent, 9000, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, 9000, Belgium
| | - Cristina Maria Osuna-Cruz
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
- Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
| | - Marta Santana Silva
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Nicole Poulsen
- B CUBE Center for Molecular Bioengineering, Technical University of Dresden, Tatzberg 41, Dresden, 01307, Germany
| | - Sofie D'hondt
- Department of Biology, Protistology and Aquatic Ecology, Ghent University, Ghent, 9000, Belgium
| | - Petra Bulankova
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Wim Vyverman
- Department of Biology, Protistology and Aquatic Ecology, Ghent University, Ghent, 9000, Belgium
| | - Lieven De Veylder
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
- Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
| |
Collapse
|
8
|
Colinas M, Pollier J, Vaneechoutte D, Malat DG, Schweizer F, De Milde L, De Clercq R, Guedes JG, Martínez-Cortés T, Molina-Hidalgo FJ, Sottomayor M, Vandepoele K, Goossens A. Subfunctionalization of Paralog Transcription Factors Contributes to Regulation of Alkaloid Pathway Branch Choice in Catharanthus roseus. FRONTIERS IN PLANT SCIENCE 2021; 12:687406. [PMID: 34113373 PMCID: PMC8186833 DOI: 10.3389/fpls.2021.687406] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 04/27/2021] [Indexed: 06/12/2023]
Abstract
Catharanthus roseus produces a diverse range of specialized metabolites of the monoterpenoid indole alkaloid (MIA) class in a heavily branched pathway. Recent great progress in identification of MIA biosynthesis genes revealed that the different pathway branch genes are expressed in a highly cell type- and organ-specific and stress-dependent manner. This implies a complex control by specific transcription factors (TFs), only partly revealed today. We generated and mined a comprehensive compendium of publicly available C. roseus transcriptome data for MIA pathway branch-specific TFs. Functional analysis was performed through extensive comparative gene expression analysis and profiling of over 40 MIA metabolites in the C. roseus flower petal expression system. We identified additional members of the known BIS and ORCA regulators. Further detailed study of the ORCA TFs suggests subfunctionalization of ORCA paralogs in terms of target gene-specific regulation and synergistic activity with the central jasmonate response regulator MYC2. Moreover, we identified specific amino acid residues within the ORCA DNA-binding domains that contribute to the differential regulation of some MIA pathway branches. Our results advance our understanding of TF paralog specificity for which, despite the common occurrence of closely related paralogs in many species, comparative studies are scarce.
Collapse
Affiliation(s)
- Maite Colinas
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Jacob Pollier
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Metabolomics Core, Ghent, Belgium
| | - Dries Vaneechoutte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Deniz G. Malat
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Fabian Schweizer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Liesbeth De Milde
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Rebecca De Clercq
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Joana G. Guedes
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairaão, Portugal
- I3S-Instituto de Investigação e Inovação em Saúde, IBMC-Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal
- ICBAS–Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Porto, Portugal
| | - Teresa Martínez-Cortés
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairaão, Portugal
| | - Francisco J. Molina-Hidalgo
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Mariana Sottomayor
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairaão, Portugal
- Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Alain Goossens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| |
Collapse
|
9
|
Vancaester E, Depuydt T, Osuna-Cruz CM, Vandepoele K. Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms. Mol Biol Evol 2021; 37:3243-3257. [PMID: 32918458 DOI: 10.1093/molbev/msaa182] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3-5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.
Collapse
Affiliation(s)
- Emmelien Vancaester
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Cristina Maria Osuna-Cruz
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
10
|
De Clercq I, Van de Velde J, Luo X, Liu L, Storme V, Van Bel M, Pottie R, Vaneechoutte D, Van Breusegem F, Vandepoele K. Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators. NATURE PLANTS 2021; 7:500-513. [PMID: 33846597 DOI: 10.1038/s41477-021-00894-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 03/04/2021] [Indexed: 05/12/2023]
Abstract
Gene regulation is a dynamic process in which transcription factors (TFs) play an important role in controlling spatiotemporal gene expression. To enhance our global understanding of regulatory interactions in Arabidopsis thaliana, different regulatory input networks capturing complementary information about DNA motifs, open chromatin, TF-binding and expression-based regulatory interactions were combined using a supervised learning approach, resulting in an integrated gene regulatory network (iGRN) covering 1,491 TFs and 31,393 target genes (1.7 million interactions). This iGRN outperforms the different input networks to predict known regulatory interactions and has a similar performance to recover functional interactions compared to state-of-the-art experimental methods. The iGRN correctly inferred known functions for 681 TFs and predicted new gene functions for hundreds of unknown TFs. For regulators predicted to be involved in reactive oxygen species (ROS) stress regulation, we confirmed in total 75% of TFs with a function in ROS and/or physiological stress responses. This includes 13 ROS regulators, previously not connected to any ROS or stress function, that were experimentally validated in our ROS-specific phenotypic assays of loss- or gain-of-function lines. In conclusion, the presented iGRN offers a high-quality starting point to enhance our understanding of gene regulation in plants by integrating different experimental data types.
Collapse
Affiliation(s)
- Inge De Clercq
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
- VIB Center for Plant Systems Biology, Ghent, Belgium.
| | - Jan Van de Velde
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Xiaopeng Luo
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Li Liu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Veronique Storme
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Robin Pottie
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Dries Vaneechoutte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Frank Van Breusegem
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
- VIB Center for Plant Systems Biology, Ghent, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| |
Collapse
|
11
|
Delli-Ponti R, Shivhare D, Mutwil M. Using Gene Expression to Study Specialized Metabolism-A Practical Guide. FRONTIERS IN PLANT SCIENCE 2021; 11:625035. [PMID: 33510763 PMCID: PMC7835209 DOI: 10.3389/fpls.2020.625035] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 11/30/2020] [Indexed: 05/25/2023]
Abstract
Plants produce a vast array of chemical compounds that we use as medicines and flavors, but these compounds' biosynthetic pathways are still poorly understood. This paucity precludes us from modifying, improving, and mass-producing these specialized metabolites in suitable bioreactors. Many of the specialized metabolites are expressed in a narrow range of organs, tissues, and cell types, suggesting a tight regulation of the responsible biosynthetic pathways. Fortunately, with unprecedented ease of generating gene expression data and with >200,000 publicly available RNA sequencing samples, we are now able to study the expression of genes from hundreds of plant species. This review demonstrates how gene expression can elucidate the biosynthetic pathways by mining organ-specific genes, gene expression clusters, and applying various types of co-expression analyses. To empower biologists to perform these analyses, we showcase these analyses using recently published, user-friendly tools. Finally, we analyze the performance of co-expression networks and show that they are a valuable addition to elucidating multiple the biosynthetic pathways of specialized metabolism.
Collapse
Affiliation(s)
| | | | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
12
|
Shin J, Marx H, Richards A, Vaneechoutte D, Jayaraman D, Maeda J, Chakraborty S, Sussman M, Vandepoele K, Ané JM, Coon J, Roy S. A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. Nucleic Acids Res 2021; 49:e3. [PMID: 33219668 PMCID: PMC7797074 DOI: 10.1093/nar/gkaa1041] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 09/19/2020] [Accepted: 10/19/2020] [Indexed: 12/23/2022] Open
Abstract
Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.
Collapse
Affiliation(s)
- Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Harald Marx
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Alicia Richards
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Dries Vaneechoutte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, Belgium
- VIB Center for Plant Systems Biology, VIB, Technologiepark 927, Ghent, Belgium
| | - Dhileepkumar Jayaraman
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Junko Maeda
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Sanhita Chakraborty
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Michael Sussman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, Belgium
- VIB Center for Plant Systems Biology, VIB, Technologiepark 927, Ghent, Belgium
| | - Jean-Michel Ané
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| |
Collapse
|
13
|
Van Bel M, Bucchini F, Vandepoele K. Gene space completeness in complex plant genomes. CURRENT OPINION IN PLANT BIOLOGY 2019; 48:9-17. [PMID: 30797187 DOI: 10.1016/j.pbi.2019.01.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 12/10/2018] [Accepted: 01/21/2019] [Indexed: 05/22/2023]
Abstract
Genome annotations offer ample opportunities to study gene functions, biochemical and regulatory pathways, or quantitative trait loci in plants. Determining the quality and completeness of a genome annotation, and maintaining the balance between them, are major challenges, even for genomes of well-studied model organisms. In this review, we present a historical overview of the complexity in different plant genomes and discuss the hurdles and possible solutions in obtaining a complete and high-quality genome annotation. We illustrate there is no clear-cut answer to solve these challenges for different gene types, but provide tips on guiding the iterative process of generating a superior genome annotation, which is a moving target as our knowledge about plant genomics increases and additional data sources become available.
Collapse
Affiliation(s)
- Michiel Van Bel
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| | - François Bucchini
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.
| |
Collapse
|