1
|
Liu L, Heidecker M, Depuydt T, Manosalva Perez N, Crespi M, Blein T, Vandepoele K. Transcription factors KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4 regulate long intergenic noncoding RNAs expressed in Arabidopsis roots. PLANT PHYSIOLOGY 2023; 193:1933-1953. [PMID: 37345955 DOI: 10.1093/plphys/kiad360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/02/2023] [Accepted: 06/05/2023] [Indexed: 06/23/2023]
Abstract
Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in plant genomes. While some lincRNAs have been characterized as important regulators in different biological processes, little is known about the transcriptional regulation for most plant lincRNAs. Through the integration of 8 annotation resources, we defined 6,599 high-confidence lincRNA loci in Arabidopsis (Arabidopsis thaliana). For lincRNAs belonging to different evolutionary age categories, we identified major differences in sequence and chromatin features, as well as in the level of conservation and purifying selection acting during evolution. Spatiotemporal gene expression profiles combined with transcription factor (TF) chromatin immunoprecipitation (ChIP) data were used to construct a TF-lincRNA regulatory network containing 2,659 lincRNAs and 15,686 interactions. We found that properties characterizing lincRNA expression, conservation, and regulation differ between plants and animals. Experimental validation confirmed the role of 3 TFs, KANADI 1, MYB DOMAIN PROTEIN 44, and PHYTOCHROME INTERACTING FACTOR 4, as key regulators controlling root-specific lincRNA expression, demonstrating the predictive power of our network. Furthermore, we identified 58 lincRNAs, regulated by these TFs, showing strong root cell type-specific expression or chromatin accessibility, which are linked with genome-wide association studies genetic associations related to root system development and growth. The multilevel genome-wide characterization covering chromatin state information, promoter conservation, and chromatin immunoprecipitation-based TF binding, for all detectable lincRNAs across 769 expression samples, permits rapidly defining the biological context and relevance of Arabidopsis lincRNAs through regulatory networks.
Collapse
Affiliation(s)
- Li Liu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Michel Heidecker
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Nicolas Manosalva Perez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Martin Crespi
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Thomas Blein
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Evry, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
- CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, 91190 Gif-sur-Yvette, France
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
| |
Collapse
|
2
|
Song H, Wang Q, Zhang Z, Lin K, Pang E. Identification of clade-wide putative cis-regulatory elements from conserved non-coding sequences in Cucurbitaceae genomes. HORTICULTURE RESEARCH 2023; 10:uhad038. [PMID: 37799630 PMCID: PMC10548412 DOI: 10.1093/hr/uhad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 02/20/2023] [Indexed: 10/07/2023]
Abstract
Cis-regulatory elements regulate gene expression and play an essential role in the development and physiology of organisms. Many conserved non-coding sequences (CNSs) function as cis-regulatory elements. They control the development of various lineages. However, predicting clade-wide cis-regulatory elements across several closely related species remains challenging. Based on the relationship between CNSs and cis-regulatory elements, we present a computational approach that predicts the clade-wide putative cis-regulatory elements in 12 Cucurbitaceae genomes. Using 12-way whole-genome alignment, we first obtained 632 112 CNSs in Cucurbitaceae. Next, we identified 16 552 Cucurbitaceae-wide cis-regulatory elements based on collinearity among all 12 Cucurbitaceae plants. Furthermore, we predicted 3 271 potential regulatory pairs in the cucumber genome, of which 98 were verified using integrative RNA sequencing and ChIP sequencing datasets from samples collected during various fruit development stages. The CNSs, Cucurbitaceae-wide cis-regulatory elements, and their target genes are accessible at http://cmb.bnu.edu.cn/cisRCNEs_cucurbit/. These elements are valuable resources for functionally annotating CNSs and their regulatory roles in Cucurbitaceae genomes.
Collapse
Affiliation(s)
- Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Qi Wang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Zhonghua Zhang
- College of Horticulture, Qingdao Agricultural University, Qingdao 266109, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
3
|
Liang YY, Chen XY, Zhou BF, Mitchell-Olds T, Wang B. Globally Relaxed Selection and Local Adaptation in Boechera stricta. Genome Biol Evol 2022; 14:evac043. [PMID: 35349686 PMCID: PMC9011030 DOI: 10.1093/gbe/evac043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 11/25/2022] Open
Abstract
The strength of selection varies among populations and across the genome, but the determinants of efficacy of selection remain unclear. In this study, we used whole-genome sequencing data from 467 Boechera stricta accessions to quantify the strength of selection and characterize the pattern of local adaptation. We found low genetic diversity on 0-fold degenerate sites and conserved non-coding sites, indicating functional constraints on these regions. The estimated distribution of fitness effects and the proportion of fixed substitutions suggest relaxed negative and positive selection in B. stricta. Among the four population groups, the NOR and WES groups have smaller effective population size (Ne), higher proportions of effectively neutral sites, and lower rates of adaptive evolution compared with UTA and COL groups, reflecting the effect of Ne on the efficacy of natural selection. We also found weaker selection on GC-biased sites compared with GC-conservative (unbiased) sites, suggested that GC-biased gene conversion has affected the strength of selection in B. stricta. We found mixed evidence for the role of the recombination rate on the efficacy of selection. The positive and negative selection was stronger in high-recombination regions compared with low-recombination regions in COL but not in other groups. By scanning the genome, we found different subsets of selected genes suggesting differential adaptation among B. stricta groups. These results show that differences in effective population size, nucleotide composition, and recombination rate are important determinants of the efficacy of selection. This study enriches our understanding of the roles of natural selection and local adaptation in shaping genomic variation.
Collapse
Affiliation(s)
- Yi-Ye Liang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences,
Guangzhou, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Xue-Yan Chen
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences,
Guangzhou, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Biao-Feng Zhou
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences,
Guangzhou, China
- University of the Chinese Academy of Sciences, Beijing, China
| | | | - Baosheng Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences,
Guangzhou, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China
| |
Collapse
|
4
|
Zhou BF, Yuan S, Crowl AA, Liang YY, Shi Y, Chen XY, An QQ, Kang M, Manos PS, Wang B. Phylogenomic analyses highlight innovation and introgression in the continental radiations of Fagaceae across the Northern Hemisphere. Nat Commun 2022; 13:1320. [PMID: 35288565 PMCID: PMC8921187 DOI: 10.1038/s41467-022-28917-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 02/16/2022] [Indexed: 12/12/2022] Open
Abstract
Northern Hemisphere forests changed drastically in the early Eocene with the diversification of the oak family (Fagaceae). Cooling climates over the next 20 million years fostered the spread of temperate biomes that became increasingly dominated by oaks and their chestnut relatives. Here we use phylogenomic analyses of nuclear and plastid genomes to investigate the timing and pattern of major macroevolutionary events and ancient genome-wide signatures of hybridization across Fagaceae. Innovation related to seed dispersal is implicated in triggering waves of continental radiations beginning with the rapid diversification of major lineages and resulting in unparalleled transformation of forest dynamics within 15 million years following the K-Pg extinction. We detect introgression at multiple time scales, including ancient events predating the origination of genus-level diversity. As oak lineages moved into newly available temperate habitats in the early Miocene, secondary contact between previously isolated species occurred. This resulted in adaptive introgression, which may have further amplified the diversification of white oaks across Eurasia. Fagaceae are diverse family including trees of ecological and economic importance. This phylogenomic analysis of nuclear and plastid genomes reconstructs evolutionary history and finds evidence of multiple adaptive introgression events in this important plant family.
Collapse
|
5
|
Chen L, Zhu QH. The evolutionary landscape and expression pattern of plant lincRNAs. RNA Biol 2022; 19:1190-1207. [PMID: 36382947 PMCID: PMC9673970 DOI: 10.1080/15476286.2022.2144609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are important regulators of cellular processes, including development and stress response. Many lincRNAs have been bioinformatically identified in plants, but their evolutionary dynamics and expression characteristics are still elusive. Here, we systematically identified thousands of lincRNAs in 26 plant species, including 6 non-flowering plants, investigated the conservation of the identified lincRNAs in different levels of plant lineages based on sequence and/or synteny homology and explored characteristics of the conserved lincRNAs during plant evolution and their co-expression relationship with protein-coding genes (PCGs). In addition to confirmation of the features well documented in literature for lincRNAs, such as species-specific, fewer exons, tissue-specific expression patterns and less abundantly expressed, we revealed that histone modification signals and/or binding sites of transcription factors were enriched in the conserved lincRNAs, implying their biological functionalities, as demonstrated by identifying conserved lincRNAs related to flower development in both the Brassicaceae and grass families and ancient lincRNAs potentially functioning in meristem development of non-flowering plants. Compared to PCGs, lincRNAs are more likely to be associated with transposable elements (TEs), but with different characteristics in different evolutionary lineages, for instance, the types of TEs and the variable level of association in lincRNAs with different conservativeness. Together, these results provide a comprehensive view on the evolutionary landscape of plant lincRNAs and shed new insights on the conservation and functionality of plant lincRNAs.
Collapse
Affiliation(s)
- Li Chen
- School of Life Sciences, Westlake University, Hangzhou, China,Institute for Biology, Plant Cell and Molecular Biology, Humboldt-Universität Zu Berlin, Berlin, Germany,CONTACT Li Chen
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Canberra, Australia,Qian-Hao Zhu CSIRO Agriculture and Food, Canberra, ACT2601, Australia
| |
Collapse
|
6
|
Kim MS, Lozano R, Kim JH, Bae DN, Kim ST, Park JH, Choi MS, Kim J, Ok HC, Park SK, Gore MA, Moon JK, Jeong SC. The patterns of deleterious mutations during the domestication of soybean. Nat Commun 2021; 12:97. [PMID: 33397978 PMCID: PMC7782591 DOI: 10.1038/s41467-020-20337-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 11/25/2020] [Indexed: 01/29/2023] Open
Abstract
Globally, soybean is a major protein and oil crop. Enhancing our understanding of the soybean domestication and improvement process helps boost genomics-assisted breeding efforts. Here we present a genome-wide variation map of 10.6 million single-nucleotide polymorphisms and 1.4 million indels for 781 soybean individuals which includes 418 domesticated (Glycine max), 345 wild (Glycine soja), and 18 natural hybrid (G. max/G. soja) accessions. We describe the enhanced detection of 183 domestication-selective sweeps and the patterns of putative deleterious mutations during domestication and improvement. This predominantly selfing species shows 7.1% reduction of overall deleterious mutations in domesticated soybean relative to wild soybean and a further 1.4% reduction from landrace to improved accessions. The detected domestication-selective sweeps also show reduced levels of deleterious alleles. Importantly, genotype imputation with this resource increases the mapping resolution of genome-wide association studies for seed protein and oil traits in a soybean diversity panel.
Collapse
Affiliation(s)
- Myung-Shin Kim
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
- Plant Immunity Research Center, Plant Genomics and Breeding Institute, College of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Korea
| | - Roberto Lozano
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Ji Hong Kim
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Dong Nyuk Bae
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Sang-Tae Kim
- Department of Life Science, The Catholic University of Korea, Bucheon, 14662, Korea
| | - Jung-Ho Park
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Man Soo Choi
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Jaehyun Kim
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Hyun-Choong Ok
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Soo-Kwon Park
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jung-Kyung Moon
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea.
- Agricultural Genome Center, National Academy of Agricultural Sciences, Rural Development Administration, Jeonju, Jeonbuk, 55365, Korea.
| | - Soon-Chun Jeong
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea.
| |
Collapse
|
7
|
Tian F, Yang DC, Meng YQ, Jin J, Gao G. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res 2020; 48:D1104-D1113. [PMID: 31701126 PMCID: PMC7145545 DOI: 10.1093/nar/gkz1020] [Citation(s) in RCA: 275] [Impact Index Per Article: 68.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/17/2019] [Accepted: 10/21/2019] [Indexed: 11/18/2022] Open
Abstract
With the goal of charting plant transcriptional regulatory maps (i.e. transcription factors (TFs), cis-elements and interactions between them), we have upgraded the TF-centred database PlantTFDB (http://planttfdb.cbi.pku.edu.cn/) to a plant regulatory data and analysis platform PlantRegMap (http://plantregmap.cbi.pku.edu.cn/) over the past three years. In this version, we updated the annotations for the previously collected TFs and set up a new section, ‘extended TF repertoires’ (TFext), to allow users prompt access to the TF repertoires of newly sequenced species. In addition to our regular TF updates, we are dedicated to updating the data on cis-elements and functional interactions between TFs and cis-elements. We established genome-wide conservation landscapes for 63 representative plants and then developed an algorithm, FunTFBS, to screen for functional regulatory elements and interactions by coupling the base-varied binding affinities of TFs with the evolutionary footprints on their binding sites. Using the FunTFBS algorithm and the conservation landscapes, we further identified over 20 million functional TF binding sites (TFBSs) and two million functional interactions for 21 346 TFs, charting the functional regulatory maps of these 63 plants. These resources are publicly available at PlantRegMap (http://plantregmap.cbi.pku.edu.cn/) and a cloud-based mirror (http://plantregmap.gao-lab.org/), providing the plant research community with valuable resources for decoding plant transcriptional regulatory systems.
Collapse
Affiliation(s)
- Feng Tian
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing 100871, China.,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - De-Chang Yang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing 100871, China
| | - Yu-Qi Meng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing 100871, China
| | - Jinpu Jin
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing 100871, China
| |
Collapse
|
8
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 DOI: 10.1101/642306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 05/26/2023] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest Management Colorado State University Fort Collins CO USA
- Department of Biological Sciences University of Cyprus Nicosia Cyprus
| | - Lua Lopez
- Department of Biology Binghamton University (State University of New York) Binghamton NY USA
| | - Adrian E Platts
- Simons Center for Quantitative Biology Cold Spring Harbor Laboratory Cold Spring Harbor NY USA
- Department of Biology Center for Genomics and Systems Biology New York University New York NY USA
| | - Jesse R Lasky
- Department of Biology Pennsylvania State University University Park PA USA
| |
Collapse
|
9
|
Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol 2020; 10:1889-1904. [PMID: 32128123 PMCID: PMC7042746 DOI: 10.1002/ece3.6002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 12/19/2019] [Accepted: 12/23/2019] [Indexed: 12/25/2022] Open
Abstract
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis-regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low-frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis-regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic-acid pathway to local adaptation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences & Pest ManagementColorado State UniversityFort CollinsCOUSA
- Department of Biological SciencesUniversity of CyprusNicosiaCyprus
| | - Lua Lopez
- Department of BiologyBinghamton University (State University of New York)BinghamtonNYUSA
| | - Adrian E. Platts
- Simons Center for Quantitative BiologyCold Spring Harbor LaboratoryCold Spring HarborNYUSA
- Department of BiologyCenter for Genomics and Systems BiologyNew York UniversityNew YorkNYUSA
| | - Jesse R. Lasky
- Department of BiologyPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
10
|
Yu X, Martin PGP, Michaels SD. BORDER proteins protect expression of neighboring genes by promoting 3' Pol II pausing in plants. Nat Commun 2019; 10:4359. [PMID: 31554790 PMCID: PMC6761125 DOI: 10.1038/s41467-019-12328-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 08/30/2019] [Indexed: 12/18/2022] Open
Abstract
Ensuring that one gene's transcription does not inappropriately affect the expression of its neighbors is a fundamental challenge to gene regulation in a genomic context. In plants, which lack homologs of animal insulator proteins, the mechanisms that prevent transcriptional interference are not well understood. Here we show that BORDER proteins are enriched in intergenic regions and prevent interference between closely spaced genes on the same strand by promoting the 3' pausing of RNA polymerase II at the upstream gene. In the absence of BORDER proteins, 3' pausing associated with the upstream gene is reduced and shifts into the promoter region of the downstream gene. This is consistent with a model in which BORDER proteins inhibit transcriptional interference by preventing RNA polymerase from intruding into the promoters of downstream genes.
Collapse
Affiliation(s)
- Xuhong Yu
- Department of Biology, Indiana University, 915 East Third Street, Bloomington, IN, 47405, USA
| | - Pascal G P Martin
- Department of Biology, Indiana University, 915 East Third Street, Bloomington, IN, 47405, USA.,Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRA, ENVT, INP-Purpan, UPS, 31027, Toulouse, France
| | - Scott D Michaels
- Department of Biology, Indiana University, 915 East Third Street, Bloomington, IN, 47405, USA.
| |
Collapse
|
11
|
Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae. Sci Rep 2019; 9:12122. [PMID: 31431676 PMCID: PMC6702216 DOI: 10.1038/s41598-019-47797-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/19/2019] [Indexed: 01/19/2023] Open
Abstract
Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.
Collapse
|
12
|
Deforges J, Reis RS, Jacquet P, Vuarambon DJ, Poirier Y. Prediction of regulatory long intergenic non-coding RNAs acting in trans through base-pairing interactions. BMC Genomics 2019; 20:601. [PMID: 31331261 PMCID: PMC6647327 DOI: 10.1186/s12864-019-5946-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 06/30/2019] [Indexed: 12/13/2022] Open
Abstract
Background Long intergenic non-coding RNAs (lincRNAs) can act as regulators of expression of protein-coding genes. Trans-natural antisense transcripts (trans-NATs) are a type of lincRNAs that contain sequence complementary to mRNA from other loci. The regulatory potential of trans-NATs has been poorly studied in eukaryotes and no example of trans-NATs regulating gene expression in plants are reported. The goal of this study was to identify lincRNAs, and particularly trans-NATs, in Arabidopsis thaliana that have a potential to regulate expression of target genes in trans at the transcriptional or translational level. Results We identified 1001 lincRNAs using an RNAseq dataset from total polyA+ and polysome-associated RNA of seedlings grown under high and low phosphate, or shoots and roots treated with different phytohormones, of which 550 were differentially regulated. Approximately 30% of lincRNAs showed conservation amongst Brassicaceae and 25% harbored transposon element (TE) sequences. Gene co-expression network analysis highlighted a group of lincRNAs associated with the response of roots to low phosphate. A total of 129 trans-NATs were predicted, of which 88 were significantly differentially expressed under at least one pairwise comparison. Five trans-NATs showed a positive correlation between their expression and target mRNA steady-state levels, and three showed a negative correlation. Expression of four trans-NATs positively correlated with a change in target mRNA polysome association. The regulatory potential of these trans-NATs did not implicate miRNA mimics nor siRNAs. We also looked for lincRNAs that could regulate gene expression in trans by Watson-Crick DNA:RNA base pairing with target protein-encoding loci. We identified 100 and 81 with a positive or negative correlation, respectively, with steady-state level of their predicted target. The regulatory potential of one such candidate lincRNA harboring a SINE TE sequence was validated in a protoplast assay on three distinct genes containing homologous TE sequence in their promoters. Construction of networks highlighted other putative lincRNAs with multiple predicted target loci for which expression was positively correlated with target gene expression. Conclusions This study identified lincRNAs in Arabidopsis with potential in regulating target gene expression in trans by both RNA:RNA and RNA:DNA base pairing and highlights lincRNAs harboring TE sequences in such activity. Electronic supplementary material The online version of this article (10.1186/s12864-019-5946-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jules Deforges
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015, Lausanne, Switzerland
| | - Rodrigo S Reis
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015, Lausanne, Switzerland
| | - Philippe Jacquet
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015, Lausanne, Switzerland
| | - Dominique Jacques Vuarambon
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015, Lausanne, Switzerland
| | - Yves Poirier
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015, Lausanne, Switzerland.
| |
Collapse
|
13
|
Song H, Lin K, Hu J, Pang E. An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome. FRONTIERS IN PLANT SCIENCE 2018; 9:325. [PMID: 29599790 PMCID: PMC5863696 DOI: 10.3389/fpls.2018.00325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 02/27/2018] [Indexed: 06/08/2023]
Abstract
Background: Although the cucumber reference genome and its annotation were published several years ago, the functional annotation of predicted genes, particularly protein-coding genes, still requires further improvement. In general, accurately determining orthologous relationships between genes allows for better and more robust functional assignments of predicted genes. As one of the most reliable strategies, the determination of collinearity information may facilitate reliable orthology inferences among genes from multiple related genomes. Currently, the identification of collinear segments has mainly been based on conservation of gene order and orientation. Over the course of plant genome evolution, various evolutionary events have disrupted or distorted the order of genes along chromosomes, making it difficult to use those genes as genome-wide markers for plant genome comparisons. Results: Using the localized LASTZ/MULTIZ analysis pipeline, we aligned 15 genomes, including cucumber and other related angiosperm plants, and identified a set of genomic segments that are short in length, stable in structure, uniform in distribution and highly conserved across all 15 plants. Compared with protein-coding genes, these conserved segments were more suitable for use as genomic markers for detecting collinear segments among distantly divergent plants. Guided by this set of identified collinear genomic segments, we inferred 94,486 orthologous protein-coding gene pairs (OPPs) between cucumber and 14 other angiosperm species, which were used as proxies for transferring functional terms to cucumber genes from the annotations of the other 14 genomes. In total, 10,885 protein-coding genes were assigned Gene Ontology (GO) terms which was nearly 1,300 more than results collected in Uniprot-proteomic database. Our results showed that annotation accuracy would been improved compared with other existing approaches. Conclusions: In this study, we provided an alternative resource for the functional annotation of predicted cucumber protein-coding genes, which we expect will be beneficial for the cucumber's biological study, accessible from http://cmb.bnu.edu.cn/functional_annotation. Meanwhile, using the cucumber reference genome as a case study, we presented an efficient strategy for transferring gene functional information from previously well-characterized protein-coding genes in model species to newly sequenced or "non-model" plant species.
Collapse
Affiliation(s)
- Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Jinglu Hu
- Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Japan
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| |
Collapse
|
14
|
Liang P, Saqib HSA, Zhang X, Zhang L, Tang H. Single-Base Resolution Map of Evolutionary Constraints and Annotation of Conserved Elements across Major Grass Genomes. Genome Biol Evol 2018; 10:473-488. [PMID: 29378032 PMCID: PMC5798027 DOI: 10.1093/gbe/evy006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2018] [Indexed: 12/20/2022] Open
Abstract
Conserved noncoding sequences (CNSs) are evolutionarily conserved DNA sequences that do not encode proteins but may have potential regulatory roles in gene expression. CNS in crop genomes could be linked to many important agronomic traits and ecological adaptations. Compared with the relatively mature exon annotation protocols, efficient methods are lacking to predict the location of noncoding sequences in the plant genomes. We implemented a computational pipeline that is tailored to the comparisons of plant genomes, yielding a large number of conserved sequences using rice genome as the reference. In this study, we used 17 published grass genomes, along with five monocot genomes as well as the basal angiosperm genome of Amborella trichopoda. Genome alignments among these genomes suggest that at least 12.05% of the rice genome appears to be evolving under constraints in the Poaceae lineage, with close to half of the evolutionarily constrained sequences located outside protein-coding regions. We found evidence for purifying selection acting on the conserved sequences by analyzing segregating SNPs within the rice population. Furthermore, we found that known functional motifs were significantly enriched within CNS, with many motifs associated with the preferred binding of ubiquitous transcription factors. The conserved elements that we have curated are accessible through our public database and the JBrowse server. In-depth functional annotations and evolutionary dynamics of the identified conserved sequences provide a solid foundation for studying gene regulation, genome evolution, as well as to inform gene isolation for cereal biologists.
Collapse
Affiliation(s)
- Pingping Liang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, China
| | - Hafiz Sohaib Ahmed Saqib
- Institute of Applied Ecology, Fujian Agriculture and Forestry University, Fuzhou, China
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Liangsheng Zhang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Haibao Tang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
15
|
Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation. Proc Natl Acad Sci U S A 2017; 114:E10018-E10027. [PMID: 29087317 PMCID: PMC5699049 DOI: 10.1073/pnas.1708433114] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Noncoding RNAs are an underexplored reservoir of regulatory molecules in eukaryotes. We analyzed the environmental response of roots to phosphorus (Pi) nutrition to understand how a change in availability of an essential element is managed. Pi availability influenced translational regulation mediated by small upstream ORFs on protein-coding mRNAs. Discovery, classification, and evaluation of long noncoding RNAs (lncRNAs) associated with translating ribosomes uncovered diverse new examples of translational regulation. These included Pi-regulated small peptide synthesis, ribosome-coupled phased small interfering RNA production, and the translational regulation of natural antisense RNAs and other regulatory RNAs. This study demonstrates that translational control contributes to the stability and activity of regulatory RNAs, providing an avenue for manipulation of traits. Eukaryotic transcriptomes contain a major non–protein-coding component that includes precursors of small RNAs as well as long noncoding RNA (lncRNAs). Here, we utilized the mapping of ribosome footprints on RNAs to explore translational regulation of coding and noncoding RNAs in roots of Arabidopsis thaliana shifted from replete to deficient phosphorous (Pi) nutrition. Homodirectional changes in steady-state mRNA abundance and translation were observed for all but 265 annotated protein-coding genes. Of the translationally regulated mRNAs, 30% had one or more upstream ORF (uORF) that influenced the number of ribosomes on the principal protein-coding region. Nearly one-half of the 2,382 lncRNAs detected had ribosome footprints, including 56 with significantly altered translation under Pi-limited nutrition. The prediction of translated small ORFs (sORFs) by quantitation of translation termination and peptidic analysis identified lncRNAs that produce peptides, including several deeply evolutionarily conserved and significantly Pi-regulated lncRNAs. Furthermore, we discovered that natural antisense transcripts (NATs) frequently have actively translated sORFs, including five with low-Pi up-regulation that correlated with enhanced translation of the sense protein-coding mRNA. The data also confirmed translation of miRNA target mimics and lncRNAs that produce trans-acting or phased small-interfering RNA (tasiRNA/phasiRNAs). Mutational analyses of the positionally conserved sORF of TAS3a linked its translation with tasiRNA biogenesis. Altogether, this systematic analysis of ribosome-associated mRNAs and lncRNAs demonstrates that nutrient availability and translational regulation controls protein and small peptide-encoding mRNAs as well as a diverse cadre of regulatory RNAs.
Collapse
|
16
|
Hoffmann RD, Palmgren M. Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana. BMC Genomics 2016; 17:456. [PMID: 27296049 PMCID: PMC4906602 DOI: 10.1186/s12864-016-2803-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 05/27/2016] [Indexed: 01/13/2023] Open
Abstract
Background Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. However, how these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown. Results Here, we analyzed the coding and non-coding sequences of paralogous genes in Arabidopsis thaliana and compared these sequences with those of orthologous genes in Arabidopsis lyrata. Paralogs with lower expression than their duplicate had more nonsynonymous substitutions, were more likely to fractionate, and exhibited less similar expression patterns with their orthologs in the other species. Also, lower-expressed genes had greater tissue specificity. Orthologous conserved non-coding sequences in the promoters, introns, and 3′ untranslated regions were less abundant at lower-expressed genes compared to their higher-expressed paralogs. A gene ontology (GO) term enrichment analysis showed that paralogs with similar expression levels were enriched in GO terms related to ribosomes, whereas paralogs with different expression levels were enriched in terms associated with stress responses. Conclusions Loss of conserved non-coding sequences in one gene of a paralogous gene pair correlates with reduced expression levels that are more tissue specific. Together with increased mutation rates in the coding sequences, this suggests that similar forces of purifying selection act on coding and non-coding sequences. We propose that coding and non-coding sequences evolve concurrently following gene duplication. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2803-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Robert D Hoffmann
- Center for Membrane Pumps in Cells and Disease - PUMPKIN, Danish National Research Foundation, Department of Plant and Environmental Sciences, University of Copenhagen, 1871, Frederiksberg C, Denmark.
| | - Michael Palmgren
- Center for Membrane Pumps in Cells and Disease - PUMPKIN, Danish National Research Foundation, Department of Plant and Environmental Sciences, University of Copenhagen, 1871, Frederiksberg C, Denmark
| |
Collapse
|
17
|
Muiño JM, de Bruijn S, Pajoro A, Geuten K, Vingron M, Angenent GC, Kaufmann K. Evolution of DNA-Binding Sites of a Floral Master Regulatory Transcription Factor. Mol Biol Evol 2015; 33:185-200. [PMID: 26429922 PMCID: PMC4693976 DOI: 10.1093/molbev/msv210] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Flower development is controlled by the action of key regulatory transcription factors of the MADS-domain family. The function of these factors appears to be highly conserved among species based on mutant phenotypes. However, the conservation of their downstream processes is much less well understood, mostly because the evolutionary turnover and variation of their DNA-binding sites (BSs) among plant species have not yet been experimentally determined. Here, we performed comparative ChIP (chromatin immunoprecipitation)-seq experiments of the MADS-domain transcription factor SEPALLATA3 (SEP3) in two closely related Arabidopsis species: Arabidopsis thaliana and A. lyrata which have very similar floral organ morphology. We found that BS conservation is associated with DNA sequence conservation, the presence of the CArG-box BS motif and on the relative position of the BS to its potential target gene. Differences in genome size and structure can explain that SEP3 BSs in A. lyrata can be located more distantly to their potential target genes than their counterparts in A. thaliana. In A. lyrata, we identified transposition as a mechanism to generate novel SEP3 binding locations in the genome. Comparative gene expression analysis shows that the loss/gain of BSs is associated with a change in gene expression. In summary, this study investigates the evolutionary dynamics of DNA BSs of a floral key-regulatory transcription factor and explores factors affecting this phenomenon.
Collapse
Affiliation(s)
- Jose M Muiño
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany Laboratory of Bioinformatics, Wageningen University, Wageningen, The Netherlands
| | - Suzanne de Bruijn
- Institute for Biochemistry and Biology, Potsdam University, Potsdam, Germany Laboratory of Molecular Biology, Wageningen University, Wageningen, The Netherlands
| | - Alice Pajoro
- Laboratory of Molecular Biology, Wageningen University, Wageningen, The Netherlands
| | - Koen Geuten
- Laboratory of Molecular Plant Biology, Department of Biology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Martin Vingron
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Gerco C Angenent
- Laboratory of Molecular Biology, Wageningen University, Wageningen, The Netherlands Bioscience, Plant Research International, Wageningen, The Netherlands
| | - Kerstin Kaufmann
- Institute for Biochemistry and Biology, Potsdam University, Potsdam, Germany
| |
Collapse
|
18
|
Wu X, Zeng Y, Guan J, Ji G, Huang R, Li QQ. Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana. BMC Genomics 2015; 16:511. [PMID: 26155789 PMCID: PMC4568572 DOI: 10.1186/s12864-015-1691-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 06/05/2015] [Indexed: 12/22/2022] Open
Abstract
Background Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. Results Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. Conclusions The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1691-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Yong Zeng
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, China. .,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, Fujian, China.
| | - Rongting Huang
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education on Costal Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China. .,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, USA. .,Rice Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian, China.
| |
Collapse
|
19
|
de Boer JM, Datema E, Tang X, Borm TJA, Bakker EH, van Eck HJ, van Ham RCHJ, de Jong H, Visser RGF, Bachem CWB. Homologues of potato chromosome 5 show variable collinearity in the euchromatin, but dramatic absence of sequence similarity in the pericentromeric heterochromatin. BMC Genomics 2015; 16:374. [PMID: 25958312 PMCID: PMC4470070 DOI: 10.1186/s12864-015-1578-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 04/24/2015] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In flowering plants it has been shown that de novo genome assemblies of different species and genera show a significant drop in the proportion of alignable sequence. Within a plant species, however, it is assumed that different haplotypes of the same chromosome align well. In this paper we have compared three de novo assemblies of potato chromosome 5 and report on the sequence variation and the proportion of sequence that can be aligned. RESULTS For the diploid potato clone RH89-039-16 (RH) we produced two linkage phase controlled and haplotype-specific assemblies of chromosome 5 based on BAC-by-BAC sequencing, which were aligned to each other and compared to the 52 Mb chromosome 5 reference sequence of the doubled monoploid clone DM 1-3 516 R44 (DM). We identified 17.0 Mb of non-redundant sequence scaffolds derived from euchromatic regions of RH and 38.4 Mb from the pericentromeric heterochromatin. For 32.7 Mb of the RH sequences the correct position and order on chromosome 5 was determined, using genetic markers, fluorescence in situ hybridisation and alignment to the DM reference genome. This ordered fraction of the RH sequences is situated in the euchromatic arms and in the heterochromatin borders. In the euchromatic regions, the sequence collinearity between the three chromosomal homologs is good, but interruption of collinearity occurs at nine gene clusters. Towards and into the heterochromatin borders, absence of collinearity due to structural variation was more extensive and was caused by hemizygous and poorly aligning regions of up to 450 kb in length. In the most central heterochromatin, a total of 22.7 Mb sequence from both RH haplotypes remained unordered. These RH sequences have very few syntenic regions and represent a non-alignable region between the RH and DM heterochromatin haplotypes of chromosome 5. CONCLUSIONS Our results show that among homologous potato chromosomes large regions are present with dramatic loss of sequence collinearity. This stresses the need for more de novo reference assemblies in order to capture genome diversity in this crop. The discovery of three highly diverged pericentric heterochromatin haplotypes within one species is a novelty in plant genome analysis. The possible origin and cytogenetic implication of this heterochromatin haplotype diversity are discussed.
Collapse
Affiliation(s)
- Jan M de Boer
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands. .,Current address: Averis Seeds B.V., Valtherblokken Zuid 40, 7876 TC, Valthermond, The Netherlands.
| | - Erwin Datema
- Wageningen University and Research Centre, Applied Bioinformatics, Plant Research International, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands. .,Current address: KeyGene N.V., P.O. Box 216, 6700, Wageningen, The Netherlands.
| | - Xiaomin Tang
- Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands. .,Current address: Department of Biology, Colorado State University, Fort Collins, USA.
| | - Theo J A Borm
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Erin H Bakker
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Herman J van Eck
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Roeland C H J van Ham
- Wageningen University and Research Centre, Applied Bioinformatics, Plant Research International, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands. .,Current address: KeyGene N.V., P.O. Box 216, 6700, Wageningen, The Netherlands.
| | - Hans de Jong
- Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Richard G F Visser
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| | - Christian W B Bachem
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.
| |
Collapse
|
20
|
Tsai CH, Liao R, Chou B, Palumbo M, Contreras LM. Genome-wide analyses in bacteria show small-RNA enrichment for long and conserved intergenic regions. J Bacteriol 2015; 197:40-50. [PMID: 25313390 PMCID: PMC4288687 DOI: 10.1128/jb.02359-14] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/02/2014] [Indexed: 12/21/2022] Open
Abstract
Interest in finding small RNAs (sRNAs) in bacteria has significantly increased in recent years due to their regulatory functions. Development of high-throughput methods and more sophisticated computational algorithms has allowed rapid identification of sRNA candidates in different species. However, given their various sizes (50 to 500 nucleotides [nt]) and their potential genomic locations in the 5' and 3' untranslated regions as well as in intergenic regions, identification and validation of true sRNAs have been challenging. In addition, the evolution of bacterial sRNAs across different species continues to be puzzling, given that they can exert similar functions with various sequences and structures. In this study, we analyzed the enrichment patterns of sRNAs in 13 well-annotated bacterial species using existing transcriptome and experimental data. All intergenic regions were analyzed by WU-BLAST to examine conservation levels relative to species within or outside their genus. In total, more than 900 validated bacterial sRNAs and 23,000 intergenic regions were analyzed. The results indicate that sRNAs are enriched in intergenic regions, which are longer and more conserved than the average intergenic regions in the corresponding bacterial genome. We also found that sRNA-coding regions have different conservation levels relative to their flanking regions. This work provides a way to analyze how noncoding RNAs are distributed in bacterial genomes and also shows conserved features of intergenic regions that encode sRNAs. These results also provide insight into the functions of regions surrounding sRNAs and into optimization of RNA search algorithms.
Collapse
Affiliation(s)
- Chen-Hsun Tsai
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA
| | - Rick Liao
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA
| | - Brendan Chou
- Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, Texas, USA
| | - Michael Palumbo
- Computational Biology and Statistics, Wadsworth Center, Albany, New York, USA
| | - Lydia M Contreras
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
21
|
Berke L, Snel B. The histone modification H3K27me3 is retained after gene duplication and correlates with conserved noncoding sequences in Arabidopsis. Genome Biol Evol 2014; 6:572-9. [PMID: 24567304 PMCID: PMC3971591 DOI: 10.1093/gbe/evu040] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The histone modification H3K27me3 is involved in repression of transcription and plays a crucial role in developmental transitions in both animals and plants. It is deposited by PRC2 (Polycomb repressive complex 2), a conserved protein complex. In Arabidopsis thaliana, H3K27me3 is found at 15% of all genes. These tend to encode transcription factors and other regulators important for development. However, it is not known how PRC2 is recruited to target loci nor how this set of target genes arose during Arabidopsis evolution. To resolve the latter, we integrated A. thaliana gene families with five independent genome-wide H3K27me3 data sets. Gene families were either significantly enriched or depleted of H3K27me3, showing a strong impact of shared ancestry to H3K27me3 distribution. To quantify this, we performed ancestral state reconstruction of H3K27me3 on phylogenetic trees of gene families. The set of H3K27me3-marked genes changed less than expected by chance, suggesting that H3K27me3 was retained after gene duplication. This retention suggests that the PRC2-recruiting signal could be encoded in the DNA and also conserved among certain duplicated genes. Indeed, H3K27me3-marked genes were overrepresented among paralogs sharing conserved noncoding sequences (CNSs) that are enriched with transcription factor binding sites. The association of upstream CNSs with H3K27me3-marked genes represents the first genome-wide connection between H3K27me3 and potential regulatory elements in plants. Thus, we propose that CNSs likely function as part of the PRC2 recruitment in plants.
Collapse
Affiliation(s)
- Lidija Berke
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, The Netherlands
| | | |
Collapse
|
22
|
Mandel JR, Dikow RB, Funk VA, Masalia RR, Staton SE, Kozik A, Michelmore RW, Rieseberg LH, Burke JM. A target enrichment method for gathering phylogenetic information from hundreds of loci: An example from the Compositae. APPLICATIONS IN PLANT SCIENCES 2014; 2:apps.1300085. [PMID: 25202605 PMCID: PMC4103609 DOI: 10.3732/apps.1300085] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 01/13/2014] [Indexed: 05/18/2023]
Abstract
UNLABELLED PREMISE OF THE STUDY The Compositae (Asteraceae) are a large and diverse family of plants, and the most comprehensive phylogeny to date is a meta-tree based on 10 chloroplast loci that has several major unresolved nodes. We describe the development of an approach that enables the rapid sequencing of large numbers of orthologous nuclear loci to facilitate efficient phylogenomic analyses. • METHODS AND RESULTS We designed a set of sequence capture probes that target conserved orthologous sequences in the Compositae. We also developed a bioinformatic and phylogenetic workflow for processing and analyzing the resulting data. Application of our approach to 15 species from across the Compositae resulted in the production of phylogenetically informative sequence data from 763 loci and the successful reconstruction of known phylogenetic relationships across the family. • CONCLUSIONS These methods should be of great use to members of the broader Compositae community, and the general approach should also be of use to researchers studying other families.
Collapse
Affiliation(s)
- Jennifer R. Mandel
- Department of Biological Sciences, University of Memphis, Memphis, Tennessee 38152 USA
| | - Rebecca B. Dikow
- Center for Conservation and Evolutionary Genetics, National Zoological Park and Division of Mammals, National Museum of Natural History, Smithsonian Institution, Washington, D.C. 20560 USA
| | - Vicki A. Funk
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, D.C. 20560 USA
| | - Rishi R. Masalia
- Department of Plant Biology, Miller Plant Sciences, University of Georgia, Athens, Georgia 30602 USA
| | - S. Evan Staton
- Department of Genetics, Davison Life Sciences Building, University of Georgia, Athens, Georgia 30602 USA
| | - Alex Kozik
- The Genome Center, University of California, Davis, California 95616 USA
| | | | - Loren H. Rieseberg
- Department of Botany, University of British Columbia, Vancouver, British Columbia V6T 1Z4 Canada
| | - John M. Burke
- Department of Plant Biology, Miller Plant Sciences, University of Georgia, Athens, Georgia 30602 USA
| |
Collapse
|
23
|
Ellegren H. Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol 2014; 29:51-63. [DOI: 10.1016/j.tree.2013.09.008] [Citation(s) in RCA: 383] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2013] [Revised: 09/02/2013] [Accepted: 09/16/2013] [Indexed: 12/20/2022]
|
24
|
Jin J, Zhang H, Kong L, Gao G, Luo J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res 2014; 42:D1182-7. [PMID: 24174544 PMCID: PMC3965000 DOI: 10.1093/nar/gkt1016] [Citation(s) in RCA: 609] [Impact Index Per Article: 60.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Revised: 10/05/2013] [Accepted: 10/07/2013] [Indexed: 11/25/2022] Open
Abstract
With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences.
Collapse
Affiliation(s)
- Jinpu Jin
- State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, P.R. China
| | | | - Lei Kong
- State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, P.R. China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, P.R. China
| | - Jingchu Luo
- State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, P.R. China
| |
Collapse
|
25
|
Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, Hazzouri KM, Dewar K, Stinchcombe JR, Schoen DJ, Wang X, Schmutz J, Town CD, Edger PP, Pires JC, Schumaker KS, Jarvis DE, Mandáková T, Lysak MA, van den Bergh E, Schranz ME, Harrison PM, Moses AM, Bureau TE, Wright SI, Blanchette M. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet 2013; 45:891-8. [PMID: 23817568 DOI: 10.1038/ng.2684] [Citation(s) in RCA: 219] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 06/04/2013] [Indexed: 12/17/2022]
Abstract
Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
Collapse
Affiliation(s)
- Annabelle Haudry
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|