1
|
The genome regulatory landscape of Atlantic salmon liver through smoltification. PLoS One 2024; 19:e0302388. [PMID: 38648207 PMCID: PMC11034671 DOI: 10.1371/journal.pone.0302388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
The anadromous Atlantic salmon undergo a preparatory physiological transformation before seawater entry, referred to as smoltification. Key molecular developmental processes involved in this life stage transition, such as remodeling of gill functions, are known to be synchronized and modulated by environmental cues like photoperiod. However, little is known about the photoperiod influence and genome regulatory processes driving other canonical aspects of smoltification such as the large-scale changes in lipid metabolism and energy homeostasis in the developing smolt liver. Here we generate transcriptome, DNA methylation, and chromatin accessibility data from salmon livers across smoltification under different photoperiod regimes. We find a systematic reduction of expression levels of genes with a metabolic function, such as lipid metabolism, and increased expression of energy related genes such as oxidative phosphorylation, during smolt development in freshwater. However, in contrast to similar studies of the gill, smolt liver gene expression prior to seawater transfer was not impacted by photoperiodic history. Integrated analyses of gene expression, chromatin accessibility, and transcription factor (TF) binding signatures highlight chromatin remodeling and TF dynamics underlying smolt gene regulatory changes. Differential peak accessibility patterns largely matched differential gene expression patterns during smoltification and we infer that ZNF682, KLFs, and NFY TFs are important in driving a liver metabolic shift from synthesis to break down of organic compounds in freshwater. Overall, chromatin accessibility and TFBS occupancy were highly correlated to changes in gene expression. On the other hand, we identified numerous differential methylation patterns across the genome, but associated genes were not functionally enriched or correlated to observed gene expression changes across smolt development. Taken together, this work highlights the relative importance of chromatin remodeling during smoltification and demonstrates that metabolic remodeling occurs as a preadaptation to life at sea that is not to a large extent driven by photoperiod history.
Collapse
|
2
|
Regulation of developmental gatekeeping and cell fate transition by the calpain protease DEK1 in Physcomitrium patens. Commun Biol 2024; 7:261. [PMID: 38438476 PMCID: PMC10912778 DOI: 10.1038/s42003-024-05933-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/19/2024] [Indexed: 03/06/2024] Open
Abstract
Calpains are cysteine proteases that control cell fate transitions whose loss of function causes severe, pleiotropic phenotypes in eukaryotes. Although mainly considered as modulatory proteases, human calpain targets are directed to the N-end rule degradation pathway. Several such targets are transcription factors, hinting at a gene-regulatory role. Here, we analyze the gene-regulatory networks of the moss Physcomitrium patens and characterize the regulons that are misregulated in mutants of the calpain DEFECTIVE KERNEL1 (DEK1). Predicted cleavage patterns of the regulatory hierarchies in five DEK1-controlled subnetworks are consistent with a pleiotropic and regulatory role during cell fate transitions targeting multiple functions. Network structure suggests DEK1-gated sequential transitions between cell fates in 2D-to-3D development. Our method combines comprehensive phenotyping, transcriptomics and data science to dissect phenotypic traits, and our model explains the protease function as a switch gatekeeping cell fate transitions potentially also beyond plant development.
Collapse
|
3
|
Metabolic influence of core ciliates within the rumen microbiome. THE ISME JOURNAL 2023:10.1038/s41396-023-01407-y. [PMID: 37169869 DOI: 10.1038/s41396-023-01407-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 05/13/2023]
Abstract
Protozoa comprise a major fraction of the microbial biomass in the rumen microbiome, of which the entodiniomorphs (order: Entodiniomorphida) and holotrichs (order: Vestibuliferida) are consistently observed to be dominant across a diverse genetic and geographical range of ruminant hosts. Despite the apparent core role that protozoal species exert, their major biological and metabolic contributions to rumen function remain largely undescribed in vivo. Here, we have leveraged (meta)genome-centric metaproteomes from rumen fluid samples originating from both cattle and goats fed diets with varying inclusion levels of lipids and starch, to detail the specific metabolic niches that protozoa occupy in the context of their microbial co-habitants. Initial proteome estimations via total protein counts and label-free quantification highlight that entodiniomorph species Entodinium and Epidinium as well as the holotrichs Dasytricha and Isotricha comprise an extensive fraction of the total rumen metaproteome. Proteomic detection of protozoal metabolism such as hydrogenases (Dasytricha, Isotricha, Epidinium, Enoploplastron), carbohydrate-active enzymes (Epidinium, Diplodinium, Enoploplastron, Polyplastron), microbial predation (Entodinium) and volatile fatty acid production (Entodinium and Epidinium) was observed at increased levels in high methane-emitting animals. Despite certain protozoal species having well-established reputations for digesting starch, they were unexpectedly less detectable in low methane emitting-animals fed high starch diets, which were instead dominated by propionate/succinate-producing bacterial populations suspected of being resistant to predation irrespective of host. Finally, we reaffirmed our abovementioned observations in geographically independent datasets, thus illuminating the substantial metabolic influence that under-explored eukaryotic populations have in the rumen, with greater implications for both digestion and methane metabolism.
Collapse
|
4
|
Functional validation of transposable element derived cis-regulatory elements in Atlantic salmon. G3 (BETHESDA, MD.) 2023; 13:7031732. [PMID: 36753570 PMCID: PMC10085797 DOI: 10.1093/g3journal/jkad034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/27/2023] [Accepted: 01/31/2023] [Indexed: 02/10/2023]
Abstract
Transposable elements (TEs) are hypothesized to play important roles in shaping genome evolution following whole genome duplications (WGD), including rewiring of gene regulation. In a recent analysis, duplicate gene copies that had evolved higher expression in liver following the salmonid WGD ∼100 million years ago were associated with higher numbers of predicted TE-derived cis-regulatory elements (TE-CREs). Yet, the ability of these TE-CREs to recruit transcription factors (TFs) in vivo and impact gene expression remains unknown. Here, we evaluated the gene regulatory functions of 11 TEs using luciferase promoter reporter assays in Atlantic salmon (Salmo salar) primary liver cells. Canonical Tc1-Mariner elements from intronic regions showed no or small repressive effects on transcription. However, other TE-derived cis-regulatory elements upstream of transcriptional start sites increased expression significantly. Our results question the hypothesis that TEs in the Tc1-Mariner superfamily, which were extremely active following WGD in salmonids, had a major impact on regulatory rewiring of gene duplicates, but highlights the potential of other TEs in post-WGD rewiring of gene regulation in the Atlantic salmon genome.
Collapse
|
5
|
Analyses of Genome Regulatory Evolution Following Whole-Genome Duplication Using the Phylogenetic EVE Model. Methods Mol Biol 2023; 2545:209-225. [PMID: 36720815 DOI: 10.1007/978-1-0716-2561-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Whole-genome duplications (WGDs) are important in shaping the evolution of complex genomes, including rewiring of genome regulation. To address key questions about how WGDs impact the evolution of genome regulation, we need to understand the relative importance of selection versus drift and temporal evolutionary dynamics. One promising class of statistical models that can help address such questions are phylogenetic Ornstein-Uhlenbeck (OU) models.Here we present a computational pipeline for the comparative phylogenetic analyses of genome regulation using an OU model. We have implemented this model in R and provide a step-by-step protocol for the use of this model, including example scripts and simulated test data. We provide the nonspecialist a brief overview of how this model works and how to perform tests for signatures of selection on genome regulation as well as power simulations to aid in experimental design and interpretation of results. We believe that these resources could help polyploidy research move forward in an era of rapidly increasing functional genomics data across the tree of life.
Collapse
|
6
|
Identification of growth regulators using cross-species network analysis in plants. PLANT PHYSIOLOGY 2022; 190:2350-2365. [PMID: 35984294 PMCID: PMC9706488 DOI: 10.1093/plphys/kiac374] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/05/2022] [Indexed: 05/11/2023]
Abstract
With the need to increase plant productivity, one of the challenges plant scientists are facing is to identify genes that play a role in beneficial plant traits. Moreover, even when such genes are found, it is generally not trivial to transfer this knowledge about gene function across species to identify functional orthologs. Here, we focused on the leaf to study plant growth. First, we built leaf growth transcriptional networks in Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and aspen (Populus tremula). Next, known growth regulators, here defined as genes that when mutated or ectopically expressed alter plant growth, together with cross-species conserved networks, were used as guides to predict novel Arabidopsis growth regulators. Using an in-depth literature screening, 34 out of 100 top predicted growth regulators were confirmed to affect leaf phenotype when mutated or overexpressed and thus represent novel potential growth regulators. Globally, these growth regulators were involved in cell cycle, plant defense responses, gibberellin, auxin, and brassinosteroid signaling. Phenotypic characterization of loss-of-function lines confirmed two predicted growth regulators to be involved in leaf growth (NPF6.4 and LATE MERISTEM IDENTITY2). In conclusion, the presented network approach offers an integrative cross-species strategy to identify genes involved in plant growth and development.
Collapse
|
7
|
What can cold-induced transcriptomes of Arctic Brassicaceae tell us about the evolution of cold tolerance? Mol Ecol 2022; 31:4271-4285. [PMID: 35753053 PMCID: PMC9546214 DOI: 10.1111/mec.16581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/08/2022] [Indexed: 11/28/2022]
Abstract
Little is known about the evolution of cold tolerance in polar plant species and how they differ from temperate relatives. To gain insight into their biology and the evolution of cold tolerance, we compared the molecular basis of cold response in three Arctic Brassicaceae species. We conducted a comparative time series experiment to examine transcriptional responses to low temperature. RNA was sampled at 22°C, and after 3, 6, and 24 at 2°C. We then identified sets of genes that were differentially expressed in response to cold and compared them between species, as well as to published data from the temperate Arabidopsis thaliana. Most differentially expressed genes were species‐specific, but a significant portion of the cold response was also shared among species. Among thousands of differentially expressed genes, ~200 were shared among the three Arctic species and A. thaliana, while ~100 were exclusively shared among the three Arctic species. Our results show that cold response differs markedly between Arctic Brassicaceae species, but probably builds on a conserved basis found across the family. They also confirm that highly polygenic traits such as cold tolerance may show little repeatability in their patterns of adaptation.
Collapse
|
8
|
|
9
|
PopulusPtERF85 Balances Xylem Cell Expansion and Secondary Cell Wall Formation in Hybrid Aspen. Cells 2021; 10:cells10081971. [PMID: 34440740 PMCID: PMC8393460 DOI: 10.3390/cells10081971] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/23/2021] [Accepted: 07/27/2021] [Indexed: 02/06/2023] Open
Abstract
Secondary growth relies on precise and specialized transcriptional networks that determine cell division, differentiation, and maturation of xylem cells. We identified a novel role for the ethylene-induced Populus Ethylene Response Factor PtERF85 (Potri.015G023200) in balancing xylem cell expansion and secondary cell wall (SCW) formation in hybrid aspen (Populus tremula x tremuloides). Expression of PtERF85 is high in phloem and cambium cells and during the expansion of xylem cells, while it is low in maturing xylem tissue. Extending PtERF85 expression into SCW forming zones of woody tissues through ectopic expression reduced wood density and SCW thickness of xylem fibers but increased fiber diameter. Xylem transcriptomes from the transgenic trees revealed transcriptional induction of genes involved in cell expansion, translation, and growth. The expression of genes associated with plant vascular development and the biosynthesis of SCW chemical components such as xylan and lignin, was down-regulated in the transgenic trees. Our results suggest that PtERF85 activates genes related to xylem cell expansion, while preventing transcriptional activation of genes related to SCW formation. The importance of precise spatial expression of PtERF85 during wood development together with the observed phenotypes in response to ectopic PtERF85 expression suggests that PtERF85 contributes to the transition of fiber cells from elongation to secondary cell wall deposition.
Collapse
|
10
|
Comparative regulomics supports pervasive selection on gene dosage following whole genome duplication. Genome Biol 2021; 22:103. [PMID: 33849620 PMCID: PMC8042706 DOI: 10.1186/s13059-021-02323-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 03/23/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Whole genome duplication (WGD) events have played a major role in eukaryotic genome evolution, but the consequence of these extreme events in adaptive genome evolution is still not well understood. To address this knowledge gap, we used a comparative phylogenetic model and transcriptomic data from seven species to infer selection on gene expression in duplicated genes (ohnologs) following the salmonid WGD 80-100 million years ago. RESULTS We find rare cases of tissue-specific expression evolution but pervasive expression evolution affecting many tissues, reflecting strong selection on maintenance of genome stability following genome doubling. Ohnolog expression levels have evolved mostly asymmetrically, by diverting one ohnolog copy down a path towards lower expression and possible pseudogenization. Loss of expression in one ohnolog is significantly associated with transposable element insertions in promoters and likely driven by selection on gene dosage including selection on stoichiometric balance. We also find symmetric expression shifts, and these are associated with genes under strong evolutionary constraints such as ribosome subunit genes. This possibly reflects selection operating to achieve a gene dose reduction while avoiding accumulation of "toxic mutations". Mechanistically, ohnolog regulatory divergence is dictated by the number of bound transcription factors in promoters, with transposable elements being one likely source of novel binding sites driving tissue-specific gains in expression. CONCLUSIONS Our results imply pervasive adaptive expression evolution following WGD to overcome the immediate challenges posed by genome doubling and to exploit the long-term genetic opportunities for novel phenotype evolution.
Collapse
|
11
|
Overexpression of vesicle-associated membrane protein PttVAP27-17 as a tool to improve biomass production and the overall saccharification yields in Populus trees. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:43. [PMID: 33593413 PMCID: PMC7885582 DOI: 10.1186/s13068-021-01895-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/04/2021] [Indexed: 05/03/2023]
Abstract
BACKGROUND Bioconversion of wood into bioproducts and biofuels is hindered by the recalcitrance of woody raw material to bioprocesses such as enzymatic saccharification. Targeted modification of the chemical composition of the feedstock can improve saccharification but this gain is often abrogated by concomitant reduction in tree growth. RESULTS In this study, we report on transgenic hybrid aspen (Populus tremula × tremuloides) lines that showed potential to increase biomass production both in the greenhouse and after 5 years of growth in the field. The transgenic lines carried an overexpression construct for Populus tremula × tremuloides vesicle-associated membrane protein (VAMP)-associated protein PttVAP27-17 that was selected from a gene-mining program for novel regulators of wood formation. Analytical-scale enzymatic saccharification without any pretreatment revealed for all greenhouse-grown transgenic lines, compared to the wild type, a 20-44% increase in the glucose yield per dry weight after enzymatic saccharification, even though it was statistically significant only for one line. The glucose yield after enzymatic saccharification with a prior hydrothermal pretreatment step with sulfuric acid was not increased in the greenhouse-grown transgenic trees on a dry-weight basis, but increased by 26-50% when calculated on a whole biomass basis in comparison to the wild-type control. Tendencies to increased glucose yields by up to 24% were present on a whole tree biomass basis after acidic pretreatment and enzymatic saccharification also in the transgenic trees grown for 5 years on the field when compared to the wild-type control. CONCLUSIONS The results demonstrate the usefulness of gene-mining programs to identify novel genes with the potential to improve biofuel production in tree biotechnology programs. Furthermore, multi-omic analyses, including transcriptomic, proteomic and metabolomic analyses, performed here provide a toolbox for future studies on the function of VAP27 proteins in plants.
Collapse
|
12
|
Transkingdom network analysis provides insight into host-microbiome interactions in Atlantic salmon. Comput Struct Biotechnol J 2021; 19:1028-1034. [PMID: 33613868 PMCID: PMC7876536 DOI: 10.1016/j.csbj.2021.01.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/25/2021] [Accepted: 01/25/2021] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The Atlantic salmon gut constitutes an intriguing system for studying host-microbiota interactions due to the dramatic environmental change salmon experiences during its life cycle. Yet, little is known about the role of interactions in this system and there is a general deficit in computational methods for integrative analysis of omics data from host-microbiota systems. METHODS We developed a pipeline to integrate host RNAseq data and microbial 16S rRNA amplicon sequencing data using weighted correlation network analysis. Networks are first inferred from each dataset separately, followed by module detections and finally robust identification of interactions via comparisons of representative module profiles. Through the use of module profiles, this network-based dimensionality reduction approach provides a holistic view into the discovery of potential host-microbiota symbionts. RESULTS We analyzed host gene expression from the gut epithelial tissue and microbial abundances from the salmon gut in a long-term feeding trial spanning the fresh-/salt-water transition and including two feeds resembling the fatty acid compositions available in salt- and fresh-water environments, respectively. We identified several host modules with significant correlations to both microbiota modules and variables such as feed, growth and sex. Although the strongest associations largely coincided with the fresh-/salt-water transition, there was a second layer of correlations associating smaller host modules to both variables and microbiota modules. Hence, we identify extensive reprogramming of the gut epithelial transcriptome and large scale coordinated changes in gut microbiota composition associated with water type as well as evidence of host-microbiota interactions linked to feed.
Collapse
|
13
|
A metabolite roadmap of the wood-forming tissue in Populus tremula. THE NEW PHYTOLOGIST 2020; 228:1559-1572. [PMID: 32648607 DOI: 10.1111/nph.16799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 06/26/2020] [Indexed: 05/27/2023]
Abstract
Wood, or secondary xylem, is the product of xylogenesis, a developmental process that begins with the proliferation of cambial derivatives and ends with mature xylem fibers and vessels with lignified secondary cell walls. Fully mature xylem has undergone a series of cellular processes, including cell division, cell expansion, secondary wall formation, lignification and programmed cell death. A complex network of interactions between transcriptional regulators and signal transduction pathways controls wood formation. However, the role of metabolites during this developmental process has not been comprehensively characterized. To evaluate the role of metabolites during wood formation, we performed a high spatial resolution metabolomics study of the wood-forming zone of Populus tremula, including laser dissected aspen ray and fiber cells. We show that metabolites show specific patterns within the wood-forming zone, following the differentiation process from cell division to cell death. The data from profiled laser dissected aspen ray and fiber cells suggests that these two cell types host distinctly different metabolic processes. Furthermore, by integrating previously published transcriptomic and proteomic profiles generated from the same trees, we provide an integrative picture of molecular processes, for example, deamination of phenylalanine during lignification is of critical importance for nitrogen metabolism during wood formation.
Collapse
|
14
|
Leaf shape in Populus tremula is a complex, omnigenic trait. Ecol Evol 2020; 10:11922-11940. [PMID: 33209260 PMCID: PMC7663049 DOI: 10.1002/ece3.6691] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 06/26/2020] [Accepted: 07/08/2020] [Indexed: 01/10/2023] Open
Abstract
Leaf shape is a defining feature of how we recognize and classify plant species. Although there is extensive variation in leaf shape within many species, few studies have disentangled the underlying genetic architecture. We characterized the genetic architecture of leaf shape variation in Eurasian aspen (Populus tremula L.) by performing genome-wide association study (GWAS) for physiognomy traits. To ascertain the roles of identified GWAS candidate genes within the leaf development transcriptional program, we generated RNA-Seq data that we used to perform gene co-expression network analyses from a developmental series, which is publicly available within the PlantGenIE resource. We additionally used existing gene expression measurements across the population to analyze GWAS candidate genes in the context of a population-wide co-expression network and to identify genes that were differentially expressed between groups of individuals with contrasting leaf shapes. These data were integrated with expression GWAS (eQTL) results to define a set of candidate genes associated with leaf shape variation. Our results identified no clear adaptive link to leaf shape variation and indicate that leaf shape traits are genetically complex, likely determined by numerous small-effect variations in gene expression. Genes associated with shape variation were peripheral within the population-wide co-expression network, were not highly connected within the leaf development co-expression network, and exhibited signatures of relaxed selection. As such, our results are consistent with the omnigenic model.
Collapse
|
15
|
SalMotifDB: a tool for analyzing putative transcription factor binding sites in salmonid genomes. BMC Genomics 2019; 20:694. [PMID: 31477007 PMCID: PMC6720087 DOI: 10.1186/s12864-019-6051-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 08/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects. Results We present SalMotifDB, a database and associated web and R interface for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for 3072 non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases. Through motif matching and TF prediction, we have used these multi-species databases to construct putative regulatory networks in salmonid species. The utility of SalMotifDB is demonstrated by showing that key lipid metabolism regulators are predicted to regulate a set of genes affected by different lipid and fatty acid content in the feed, and by showing that our motif database explains a significant proportion of gene expression divergence in gene duplicates originating from the salmonid specific whole genome duplication. Conclusions SalMotifDB is an effective tool for analyzing transcription factors, their binding sites and the resulting gene regulatory networks in salmonid species, and will be an important tool for gaining a better mechanistic understanding of gene regulation and the associated phenotypes in salmonids. SalMotifDB is available at https://salmobase.org/apps/SalMotifDB. Electronic supplementary material The online version of this article (10.1186/s12864-019-6051-0) contains supplementary material, which is available to authorized users.
Collapse
|
16
|
Evolution of Cold Acclimation and Its Role in Niche Transition in the Temperate Grass Subfamily Pooideae. PLANT PHYSIOLOGY 2019; 180:404-419. [PMID: 30850470 PMCID: PMC6501083 DOI: 10.1104/pp.18.01448] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 02/25/2019] [Indexed: 05/24/2023]
Abstract
The grass subfamily Pooideae dominates the grass floras in cold temperate regions and has evolved complex physiological adaptations to cope with extreme environmental conditions like frost, winter, and seasonality. One such adaptation is cold acclimation, wherein plants increase their frost tolerance in response to gradually falling temperatures and shorter days in the autumn. However, understanding how complex traits like cold acclimation evolve remains a major challenge in evolutionary biology. Here, we investigated the evolution of cold acclimation in Pooideae and found that a phylogenetically diverse set of Pooideae species displayed cold acclimation capacity. However, comparing differential gene expression after cold treatment in transcriptomes of five phylogenetically diverse species revealed widespread species-specific responses of genes with conserved sequences. Furthermore, we studied the correlation between gene family size and number of cold-responsive genes as well as between selection pressure on coding sequences of genes and their cold responsiveness. We saw evidence of protein-coding and regulatory sequence evolution as well as the origin of novel genes and functions contributing toward evolution of a cold response in Pooideae. Our results reflect that selection pressure resulting from global cooling must have acted on already diverged lineages. Nevertheless, conservation of cold-induced gene expression of certain genes indicates that the Pooideae ancestor may have possessed some molecular machinery to mitigate cold stress. Evolution of adaptations to seasonally cold climates is regarded as particularly difficult. How Pooideae evolved to transition from tropical to temperate biomes sheds light on how complex traits evolve in the light of climate changes.
Collapse
|
17
|
From proteins to polysaccharides: lifestyle and genetic evolution of Coprothermobacter proteolyticus. THE ISME JOURNAL 2019; 13:603-617. [PMID: 30315317 PMCID: PMC6461833 DOI: 10.1038/s41396-018-0290-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 07/11/2018] [Accepted: 09/19/2018] [Indexed: 11/29/2022]
Abstract
Microbial communities that degrade lignocellulosic biomass are typified by high levels of species- and strain-level complexity, as well as synergistic interactions between both cellulolytic and non-cellulolytic microorganisms. Coprothermobacter proteolyticus frequently dominates thermophilic, lignocellulose-degrading communities with wide geographical distribution, which is in contrast to reports that it ferments proteinaceous substrates and is incapable of polysaccharide hydrolysis. Here we deconvolute a highly efficient cellulose-degrading consortium (SEM1b) that is co-dominated by Clostridium (Ruminiclostridium) thermocellum and multiple heterogenic strains affiliated to C. proteolyticus. Metagenomic analysis of SEM1b recovered metagenome-assembled genomes (MAGs) for each constituent population, whereas in parallel two novel strains of C. proteolyticus were successfully isolated and sequenced. Annotation of all C. proteolyticus genotypes (two strains and one MAG) revealed their genetic acquisition of carbohydrate-active enzymes (CAZymes), presumably derived from horizontal gene transfer (HGT) events involving polysaccharide-degrading Firmicutes or Thermotogae-affiliated populations that are historically co-located. HGT material included a saccharolytic operon, from which a CAZyme was biochemically characterized and demonstrated hydrolysis of multiple hemicellulose polysaccharides. Finally, temporal genome-resolved metatranscriptomic analysis of SEM1b revealed expression of C. proteolyticus CAZymes at different SEM1b life stages as well as co-expression of CAZymes from multiple SEM1b populations, inferring deeper microbial interactions that are dedicated toward community degradation of cellulose and hemicellulose. We show that C. proteolyticus, a ubiquitous population, consists of closely related strains that have adapted via HGT to presumably degrade both oligo- and longer polysaccharides present in decaying plants and microbial cell walls, thus explaining its dominance in thermophilic anaerobic digesters on a global scale.
Collapse
|
18
|
The Grayling Genome Reveals Selection on Gene Expression Regulation after Whole-Genome Duplication. Genome Biol Evol 2018; 10:2785-2800. [PMID: 30239729 PMCID: PMC6200313 DOI: 10.1093/gbe/evy201] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2018] [Indexed: 02/06/2023] Open
Abstract
Whole-genome duplication (WGD) has been a major evolutionary driver of increased genomic complexity in vertebrates. One such event occurred in the salmonid family ∼80 Ma (Ss4R) giving rise to a plethora of structural and regulatory duplicate-driven divergence, making salmonids an exemplary system to investigate the evolutionary consequences of WGD. Here, we present a draft genome assembly of European grayling (Thymallus thymallus) and use this in a comparative framework to study evolution of gene regulation following WGD. Among the Ss4R duplicates identified in European grayling and Atlantic salmon (Salmo salar), one-third reflect nonneutral tissue expression evolution, with strong purifying selection, maintained over ∼50 Myr. Of these, the majority reflect conserved tissue regulation under strong selective constraints related to brain and neural-related functions, as well as higher-order protein–protein interactions. A small subset of the duplicates have evolved tissue regulatory expression divergence in a common ancestor, which have been subsequently conserved in both lineages, suggestive of adaptive divergence following WGD. These candidates for adaptive tissue expression divergence have elevated rates of protein coding- and promoter-sequence evolution and are enriched for immune- and lipid metabolism ontology terms. Lastly, lineage-specific duplicate divergence points toward underlying differences in adaptive pressures on expression regulation in the nonanadromous grayling versus the anadromous Atlantic salmon. Our findings enhance our understanding of the role of WGD in genome evolution and highlight cases of regulatory divergence of Ss4R duplicates, possibly related to a niche shift in early salmonid evolution.
Collapse
|
19
|
Ethylene signaling induces gelatinous layers with typical features of tension wood in hybrid aspen. THE NEW PHYTOLOGIST 2018. [PMID: 29528503 DOI: 10.1111/nph.15078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The phytohormone ethylene impacts secondary stem growth in plants by stimulating cambial activity, xylem development and fiber over vessel formation. We report the effect of ethylene on secondary cell wall formation and the molecular connection between ethylene signaling and wood formation. We applied exogenous ethylene or its precursor 1-aminocyclopropane-1-carboxylic acid (ACC) to wild-type and ethylene-insensitive hybrid aspen trees (Populus tremula × tremuloides) and studied secondary cell wall anatomy, chemistry and ultrastructure. We furthermore analyzed the transcriptome (RNA Seq) after ACC application to wild-type and ethylene-insensitive trees. We demonstrate that ACC and ethylene induce gelatinous layers (G-layers) and alter the fiber cell wall cellulose microfibril angle. G-layers are tertiary wall layers rich in cellulose, typically found in tension wood of aspen trees. A vast majority of transcripts affected by ACC are downstream of ethylene perception and include a large number of transcription factors (TFs). Motif-analyses reveal potential connections between ethylene TFs (Ethylene Response Factors (ERFs), ETHYLENE INSENSITIVE 3/ETHYLENE INSENSITIVE3-LIKE1 (EIN3/EIL1)) and wood formation. G-layer formation upon ethylene application suggests that the increase in ethylene biosynthesis observed during tension wood formation is important for its formation. Ethylene-regulated TFs of the ERF and EIN3/EIL1 type could transmit the ethylene signal.
Collapse
|
20
|
Abstract
In temperate and boreal ecosystems, seasonal cycles of growth and dormancy allow perennial plants to adapt to winter conditions. We show, in hybrid aspen trees, that photoperiodic regulation of dormancy is mechanistically distinct from autumnal growth cessation. Dormancy sets in when symplastic intercellular communication through plasmodesmata is blocked by a process dependent on the phytohormone abscisic acid. The communication blockage prevents growth-promoting signals from accessing the meristem. Thus, precocious growth is disallowed during dormancy. The dormant period, which supports robust survival of the aspen tree in winter, is due to loss of access to growth-promoting signals.
Collapse
|
21
|
Life-stage-associated remodelling of lipid metabolism regulation in Atlantic salmon. Mol Ecol 2018; 27:1200-1213. [PMID: 29431879 DOI: 10.1111/mec.14533] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 12/07/2017] [Accepted: 12/13/2017] [Indexed: 01/03/2023]
Abstract
Atlantic salmon migrates from rivers to sea to feed, grow and develop gonads before returning to spawn in freshwater. The transition to marine habitats is associated with dramatic changes in the environment, including water salinity, exposure to pathogens and shift in dietary lipid availability. Many changes in physiology and metabolism occur across this life-stage transition, but little is known about the molecular nature of these changes. Here, we use a long-term feeding experiment to study transcriptional regulation of lipid metabolism in Atlantic salmon gut and liver in both fresh- and saltwater. We find that lipid metabolism becomes significantly less plastic to differences in dietary lipid composition when salmon transitions to saltwater and experiences increased dietary lipid availability. Expression of genes in liver relating to lipogenesis and lipid transport decreases overall and becomes less responsive to diet, while genes for lipid uptake in gut become more highly expressed. Finally, analyses of evolutionary consequences of the salmonid-specific whole-genome duplication on lipid metabolism reveal several pathways with significantly different (p < .05) duplicate retention or duplicate regulatory conservation. We also find a limited number of cases where the whole-genome duplication has resulted in an increased gene dosage. In conclusion, we find variable and pathway-specific effects of the salmonid genome duplication on lipid metabolism genes. A clear life-stage-associated shift in lipid metabolism regulation is evident, and we hypothesize this to be, at least partly, driven by nondietary factors such as the preparatory remodelling of gene regulation and physiology prior to sea migration.
Collapse
|
22
|
A multi-omics approach reveals function of Secretory Carrier-Associated Membrane Proteins in wood formation of Populus trees. BMC Genomics 2018; 19:11. [PMID: 29298676 PMCID: PMC5753437 DOI: 10.1186/s12864-017-4411-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 12/21/2017] [Indexed: 01/03/2023] Open
Abstract
Background Secretory Carrier-Associated Membrane Proteins (SCAMPs) are highly conserved 32–38 kDa proteins that are involved in membrane trafficking. A systems approach was taken to elucidate function of SCAMPs in wood formation of Populus trees. Phenotypic and multi-omics analyses were performed in woody tissues of transgenic Populus trees carrying an RNAi construct for Populus tremula x tremuloides SCAMP3 (PttSCAMP3; Potri.019G104000). Results The woody tissues of the transgenic trees displayed increased amounts of both polysaccharides and lignin oligomers, indicating increased deposition of both the carbohydrate and lignin components of the secondary cell walls. This coincided with a tendency towards increased wood density as well as significantly increased thickness of the suberized cork in the transgenic lines. Multivariate OnPLS (orthogonal projections to latent structures) modeling of five different omics datasets (the transcriptome, proteome, GC-MS metabolome, LC-MS metabolome and pyrolysis-GC/MS metabolome) collected from the secondary xylem tissues of the stem revealed systemic variation in the different variables in the transgenic lines, including changes that correlated with the changes in the secondary cell wall composition. The OnPLS model also identified a rather large number of proteins that were more abundant in the transgenic lines than in the wild type. Several of these were related to secretion and/or endocytosis as well as both primary and secondary cell wall biosynthesis. Conclusions Populus SCAMP proteins were shown to influence accumulation of secondary cell wall components, including polysaccharides and phenolic compounds, in the woody tissues of Populus tree stems. Our multi-omics analyses combined with the OnPLS modelling suggest that this function is mediated by changes in membrane trafficking to fine-tune the abundance of cell wall precursors and/or proteins involved in cell wall biosynthesis and transport. The data provides a multi-level source of information for future studies on the function of the SCAMP proteins in plant stem tissues. Electronic supplementary material The online version of this article (10.1186/s12864-017-4411-1) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
NorWood: a gene expression resource for evo-devo studies of conifer wood development. THE NEW PHYTOLOGIST 2017; 216:482-494. [PMID: 28186632 PMCID: PMC6079643 DOI: 10.1111/nph.14458] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 12/22/2016] [Indexed: 05/04/2023]
Abstract
The secondary xylem of conifers is composed mainly of tracheids that differ anatomically and chemically from angiosperm xylem cells. There is currently no high-spatial-resolution data available profiling gene expression during wood formation for any coniferous species, which limits insight into tracheid development. RNA-sequencing data from replicated, high-spatial-resolution section series throughout the cambial and woody tissues of Picea abies were used to generate the NorWood.conGenIE.org web resource, which facilitates exploration of the associated gene expression profiles and co-expression networks. Integration within PlantGenIE.org enabled a comparative regulomics analysis, revealing divergent co-expression networks between P. abies and the two angiosperm species Arabidopsis thaliana and Populus tremula for the secondary cell wall (SCW) master regulator NAC Class IIB transcription factors. The SCW cellulose synthase genes (CesAs) were located in the neighbourhoods of the NAC factors in A. thaliana and P. tremula, but not in P. abies. The NorWood co-expression network enabled identification of potential SCW CesA regulators in P. abies. The NorWood web resource represents a powerful community tool for generating evo-devo insights into the divergence of wood formation between angiosperms and gymnosperms and for advancing understanding of the regulation of wood development in P. abies.
Collapse
|
24
|
AspWood: High-Spatial-Resolution Transcriptome Profiles Reveal Uncharacterized Modularity of Wood Formation in Populus tremula. THE PLANT CELL 2017; 29:1585-1604. [PMID: 28655750 PMCID: PMC5559752 DOI: 10.1105/tpc.17.00153] [Citation(s) in RCA: 142] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 06/12/2017] [Accepted: 06/24/2017] [Indexed: 05/17/2023]
Abstract
Trees represent the largest terrestrial carbon sink and a renewable source of ligno-cellulose. There is significant scope for yield and quality improvement in these largely undomesticated species, and efforts to engineer elite varieties will benefit from improved understanding of the transcriptional network underlying cambial growth and wood formation. We generated high-spatial-resolution RNA sequencing data spanning the secondary phloem, vascular cambium, and wood-forming tissues of Populus tremula The transcriptome comprised 28,294 expressed, annotated genes, 78 novel protein-coding genes, and 567 putative long intergenic noncoding RNAs. Most paralogs originating from the Salicaceae whole-genome duplication had diverged expression, with the exception of those highly expressed during secondary cell wall deposition. Coexpression network analyses revealed that regulation of the transcriptome underlying cambial growth and wood formation comprises numerous modules forming a continuum of active processes across the tissues. A comparative analysis revealed that a majority of these modules are conserved in Picea abies The high spatial resolution of our data enabled identification of novel roles for characterized genes involved in xylan and cellulose biosynthesis, regulators of xylem vessel and fiber differentiation and lignification. An associated web resource (AspWood, http://aspwood.popgenie.org) provides interactive tools for exploring the expression profiles and coexpression network.
Collapse
|
25
|
Lineage-specific rediploidization is a mechanism to explain time-lags between genome duplication and evolutionary diversification. Genome Biol 2017; 18:111. [PMID: 28615063 PMCID: PMC5470254 DOI: 10.1186/s13059-017-1241-z] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 05/19/2017] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The functional divergence of duplicate genes (ohnologues) retained from whole genome duplication (WGD) is thought to promote evolutionary diversification. However, species radiation and phenotypic diversification are often temporally separated from WGD. Salmonid fish, whose ancestor underwent WGD by autotetraploidization ~95 million years ago, fit such a 'time-lag' model of post-WGD radiation, which occurred alongside a major delay in the rediploidization process. Here we propose a model, 'lineage-specific ohnologue resolution' (LORe), to address the consequences of delayed rediploidization. Under LORe, speciation precedes rediploidization, allowing independent ohnologue divergence in sister lineages sharing an ancestral WGD event. RESULTS Using cross-species sequence capture, phylogenomics and genome-wide analyses of ohnologue expression divergence, we demonstrate the major impact of LORe on salmonid evolution. One-quarter of each salmonid genome, harbouring at least 4550 ohnologues, has evolved under LORe, with rediploidization and functional divergence occurring on multiple independent occasions >50 million years post-WGD. We demonstrate the existence and regulatory divergence of many LORe ohnologues with functions in lineage-specific physiological adaptations that potentially facilitated salmonid species radiation. We show that LORe ohnologues are enriched for different functions than 'older' ohnologues that began diverging in the salmonid ancestor. CONCLUSIONS LORe has unappreciated significance as a nested component of post-WGD divergence that impacts the functional properties of genes, whilst providing ohnologues available solely for lineage-specific adaptation. Under LORe, which is predicted following many WGD events, the functional outcomes of WGD need not appear 'explosively', but can arise gradually over tens of millions of years, promoting lineage-specific diversification regimes under prevailing ecological pressures.
Collapse
|
26
|
Gene co-expression network connectivity is an important determinant of selective constraint. PLoS Genet 2017; 13:e1006402. [PMID: 28406900 PMCID: PMC5407845 DOI: 10.1371/journal.pgen.1006402] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 04/27/2017] [Accepted: 03/31/2017] [Indexed: 12/12/2022] Open
Abstract
While several studies have investigated general properties of the genetic architecture of natural variation in gene expression, few of these have considered natural, outbreeding populations. In parallel, systems biology has established that a general feature of biological networks is that they are scale-free, rendering them buffered against random mutations. To date, few studies have attempted to examine the relationship between the selective processes acting to maintain natural variation of gene expression and the associated co-expression network structure. Here we utilised RNA-Sequencing to assay gene expression in winter buds undergoing bud flush in a natural population of Populus tremula, an outbreeding forest tree species. We performed expression Quantitative Trait Locus (eQTL) mapping and identified 164,290 significant eQTLs associating 6,241 unique genes (eGenes) with 147,419 unique SNPs (eSNPs). We found approximately four times as many local as distant eQTLs, with local eQTLs having significantly higher effect sizes. eQTLs were primarily located in regulatory regions of genes (UTRs or flanking regions), regardless of whether they were local or distant. We used the gene expression data to infer a co-expression network and investigated the relationship between network topology, the genetic architecture of gene expression and signatures of selection. Within the co-expression network, eGenes were underrepresented in network module cores (hubs) and overrepresented in the periphery of the network, with a negative correlation between eQTL effect size and network connectivity. We additionally found that module core genes have experienced stronger selective constraint on coding and non-coding sequence, with connectivity associated with signatures of selection. Our integrated genetics and genomics results suggest that purifying selection is the primary mechanism underlying the genetic architecture of natural variation in gene expression assayed in flushing leaf buds of P. tremula and that connectivity within the co-expression network is linked to the strength of purifying selection.
Collapse
|
27
|
Towards integration of population and comparative genomics in forest trees. THE NEW PHYTOLOGIST 2016; 212:338-44. [PMID: 27575589 DOI: 10.1111/nph.14153] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 06/27/2016] [Indexed: 05/08/2023]
Abstract
Contents 338 I. 338 II. 339 III. 340 IV. 342 343 References 343 SUMMARY: The past decade saw the initiation of an ongoing revolution in sequencing technologies that is transforming all fields of biology. This has been driven by the advent and widespread availability of high-throughput, massively parallel short-read sequencing (MPS) platforms. These technologies have enabled previously unimaginable studies, including draft assemblies of the massive genomes of coniferous species and population-scale resequencing. Transcriptomics studies have likewise been transformed, with RNA-sequencing enabling studies in nonmodel organisms, the discovery of previously unannotated genes (novel transcripts), entirely new classes of RNAs and previously unknown regulatory mechanisms. Here we touch upon current developments in the areas of genome assembly, comparative regulomics and population genetics as they relate to studies of forest tree species.
Collapse
|
28
|
Quantitative proteomics reveals protein profiles underlying major transitions in aspen wood development. BMC Genomics 2016; 17:119. [PMID: 26887814 PMCID: PMC4758094 DOI: 10.1186/s12864-016-2458-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/09/2016] [Indexed: 01/08/2023] Open
Abstract
Background Wood development is of outstanding interest both to basic research and industry due to the associated cellulose and lignin biomass production. Efforts to elucidate wood formation (which is essential for numerous aspects of both pure and applied plant science) have been made using transcriptomic analyses and/or low-resolution sampling. However, transcriptomic data do not correlate perfectly with levels of expressed proteins due to effects of post-translational modifications and variations in turnover rates. In addition, high-resolution analysis is needed to characterize key transitions. In order to identify protein profiles across the developmental region of wood formation, an in-depth and tissue specific sampling was performed. Results We examined protein profiles, using an ultra-performance liquid chromatography/quadrupole time of flight mass spectrometry system, in high-resolution tangential sections spanning all wood development zones in Populus tremula from undifferentiated cambium to mature phloem and xylem, including cell expansion and cell death zones. In total, we analyzed 482 sections, 20–160 μm thick, from four 47-year-old trees growing wild in Sweden. We obtained high quality expression profiles for 3,082 proteins exhibiting consistency across the replicates, considering that the trees were growing in an uncontrolled environment. A combination of Principal Component Analysis (PCA), Orthogonal Projections to Latent Structures (OPLS) modeling and an enhanced stepwise linear modeling approach identified several major transitions in global protein expression profiles, pinpointing (for example) locations of the cambial division leading to phloem and xylem cells, and secondary cell wall formation zones. We also identified key proteins and associated pathways underlying these developmental landmarks. For example, many of the lignocellulosic related proteins were upregulated in the expansion to the early developmental xylem zone, and for laccases with a rapid decrease in early xylem zones. We observed upregulation of two forms of xylem cysteine protease (Potri.002G005700.1 and Potri.005G256000.2; Pt-XCP2.1) in early xylem and their downregulation in late maturing xylem. Our data also show that Pt-KOR1.3 (Potri.003G151700.2) exhibits an expression pattern that supports the hypothesis put forward in previous studies that this is a key xyloglucanase involved in cellulose biosynthesis in primary cell walls and reduction of cellulose crystallinity in secondary walls. Conclusion Our novel multivariate approach highlights important processes and provides confirmatory insights into the molecular foundations of wood development. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2458-z) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
The Plant Genome Integrative Explorer Resource: PlantGenIE.org. THE NEW PHYTOLOGIST 2015; 208:1149-56. [PMID: 26192091 DOI: 10.1111/nph.13557] [Citation(s) in RCA: 162] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 06/08/2015] [Indexed: 05/18/2023]
Abstract
Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight.
Collapse
|
30
|
Serendipitous Meta-Transcriptomics: The Fungal Community of Norway Spruce (Picea abies). PLoS One 2015; 10:e0139080. [PMID: 26413905 PMCID: PMC4586145 DOI: 10.1371/journal.pone.0139080] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 09/09/2015] [Indexed: 11/18/2022] Open
Abstract
After performing de novo transcript assembly of >1 billion RNA-Sequencing reads obtained from 22 samples of different Norway spruce (Picea abies) tissues that were not surface sterilized, we found that assembled sequences captured a mix of plant, lichen, and fungal transcripts. The latter were likely expressed by endophytic and epiphytic symbionts, indicating that these organisms were present, alive, and metabolically active. Here, we show that these serendipitously sequenced transcripts need not be considered merely as contamination, as is common, but that they provide insight into the plant’s phyllosphere. Notably, we could classify these transcripts as originating predominantly from Dothideomycetes and Leotiomycetes species, with functional annotation of gene families indicating active growth and metabolism, with particular regards to glucose intake and processing, as well as gene regulation.
Collapse
|
31
|
Synergy: a web resource for exploring gene regulation in Synechocystis sp. PCC6803. PLoS One 2014; 9:e113496. [PMID: 25420108 PMCID: PMC4242644 DOI: 10.1371/journal.pone.0113496] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 10/24/2014] [Indexed: 12/22/2022] Open
Abstract
Despite being a highly studied model organism, most genes of the cyanobacterium Synechocystis sp. PCC 6803 encode proteins with completely unknown function. To facilitate studies of gene regulation in Synechocystis, we have developed Synergy (http://synergy.plantgenie.org), a web application integrating co-expression networks and regulatory motif analysis. Co-expression networks were inferred from publicly available microarray experiments, while regulatory motifs were identified using a phylogenetic footprinting approach. Automatically discovered motifs were shown to be enriched in the network neighborhoods of regulatory proteins much more often than in the neighborhoods of non-regulatory genes, showing that the data provide a sound starting point for studying gene regulation in Synechocystis. Concordantly, we provide several case studies demonstrating that Synergy can be used to find biologically relevant regulatory mechanisms in Synechocystis. Synergy can be used to interactively perform analyses such as gene/motif search, network visualization and motif/function enrichment. Considering the importance of Synechocystis for photosynthesis and biofuel research, we believe that Synergy will become a valuable resource to the research community.
Collapse
|
32
|
Populus tremula (European aspen) shows no evidence of sexual dimorphism. BMC PLANT BIOLOGY 2014; 14:276. [PMID: 25318822 PMCID: PMC4203875 DOI: 10.1186/s12870-014-0276-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 10/06/2014] [Indexed: 05/23/2023]
Abstract
BACKGROUND Evolutionary theory suggests that males and females may evolve sexually dimorphic phenotypic and biochemical traits concordant with each sex having different optimal strategies of resource investment to maximise reproductive success and fitness. Such sexual dimorphism would result in sex biased gene expression patterns in non-floral organs for autosomal genes associated with the control and development of such phenotypic traits. RESULTS We examined morphological, biochemical and herbivory traits to test for sexually dimorphic resource allocation strategies within collections of sexually mature and immature Populus tremula (European aspen) trees. In addition we profiled gene expression in mature leaves of sexually mature wild trees using whole-genome oligonucleotide microarrays and RNA-Sequencing. CONCLUSIONS We found no evidence of sexual dimorphism or differential resource investment strategies between males and females in either sexually immature or mature trees. Similarly, single-gene differential expression and machine learning approaches revealed no evidence of large-scale sex biased gene expression. However, two significantly differentially expressed genes were identified from the RNA-Seq data, one of which is a robust diagnostic marker of sex in P. tremula.
Collapse
|
33
|
Abstract
Allohexaploid bread wheat (Triticum aestivum L.) provides approximately 20% of calories consumed by humans. Lack of genome sequence for the three homeologous and highly similar bread wheat genomes (A, B, and D) has impeded expression analysis of the grain transcriptome. We used previously unknown genome information to analyze the cell type-specific expression of homeologous genes in the developing wheat grain and identified distinct co-expression clusters reflecting the spatiotemporal progression during endosperm development. We observed no global but cell type- and stage-dependent genome dominance, organization of the wheat genome into transcriptionally active chromosomal regions, and asymmetric expression in gene families related to baking quality. Our findings give insight into the transcriptional dynamics and genome interplay among individual grain cell types in a polyploid cereal genome.
Collapse
|
34
|
ComPlEx: conservation and divergence of co-expression networks in A. thaliana, Populus and O. sativa. BMC Genomics 2014; 15:106. [PMID: 24498971 PMCID: PMC3925997 DOI: 10.1186/1471-2164-15-106] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 01/29/2014] [Indexed: 01/16/2023] Open
Abstract
Background Divergence in gene regulation has emerged as a key mechanism underlying species differentiation. Comparative analysis of co-expression networks across species can reveal conservation and divergence in the regulation of genes. Results We inferred co-expression networks of A. thaliana, Populus spp. and O. sativa using state-of-the-art methods based on mutual information and context likelihood of relatedness, and conducted a comprehensive comparison of these networks across a range of co-expression thresholds. In addition to quantifying gene-gene link and network neighbourhood conservation, we also applied recent advancements in network analysis to do cross-species comparisons of network properties such as scale free characteristics and gene centrality as well as network motifs. We found that in all species the networks emerged as scale free only above a certain co-expression threshold, and that the high-centrality genes upholding this organization tended to be conserved. Network motifs, in particular the feed-forward loop, were found to be significantly enriched in specific functional subnetworks but where much less conserved across species than gene centrality. Although individual gene-gene co-expression had massively diverged, up to ~80% of the genes still had a significantly conserved network neighbourhood. For genes with multiple predicted orthologs, about half had one ortholog with conserved regulation and another ortholog with diverged or non-conserved regulation. Furthermore, the most sequence similar ortholog was not the one with the most conserved gene regulation in over half of the cases. Conclusions We have provided a comprehensive analysis of gene regulation evolution in plants and built a web tool for Comparative analysis of Plant co-Expression networks (ComPlEx, http://complex.plantgenie.org/). The tool can be particularly useful for identifying the ortholog with the most conserved regulation among several sequence-similar alternatives and can thus be of practical importance in e.g. finding candidate genes for perturbation experiments.
Collapse
|
35
|
OnPLS integration of transcriptomic, proteomic and metabolomic data shows multi-level oxidative stress responses in the cambium of transgenic hipI- superoxide dismutase Populus plants. BMC Genomics 2013; 14:893. [PMID: 24341908 PMCID: PMC3878592 DOI: 10.1186/1471-2164-14-893] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 11/27/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reactive oxygen species (ROS) are involved in the regulation of diverse physiological processes in plants, including various biotic and abiotic stress responses. Thus, oxidative stress tolerance mechanisms in plants are complex, and diverse responses at multiple levels need to be characterized in order to understand them. Here we present system responses to oxidative stress in Populus by integrating data from analyses of the cambial region of wild-type controls and plants expressing high-isoelectric-point superoxide dismutase (hipI-SOD) transcripts in antisense orientation showing a higher production of superoxide. The cambium, a thin cell layer, generates cells that differentiate to form either phloem or xylem and is hypothesized to be a major reason for phenotypic perturbations in the transgenic plants. Data from multiple platforms including transcriptomics (microarray analysis), proteomics (UPLC/QTOF-MS), and metabolomics (GC-TOF/MS, UPLC/MS, and UHPLC-LTQ/MS) were integrated using the most recent development of orthogonal projections to latent structures called OnPLS. OnPLS is a symmetrical multi-block method that does not depend on the order of analysis when more than two blocks are analysed. Significantly affected genes, proteins and metabolites were then visualized in painted pathway diagrams. RESULTS The main categories that appear to be significantly influenced in the transgenic plants were pathways related to redox regulation, carbon metabolism and protein degradation, e.g. the glycolysis and pentose phosphate pathways (PPP). The results provide system-level information on ROS metabolism and responses to oxidative stress, and indicate that some initial responses to oxidative stress may share common pathways. CONCLUSION The proposed data evaluation strategy shows an efficient way of compiling complex, multi-platform datasets to obtain significant biological information.
Collapse
|
36
|
Characterization of cytokinin signaling and homeostasis gene families in two hardwood tree species: Populus trichocarpa and Prunus persica. BMC Genomics 2013; 14:885. [PMID: 24341635 PMCID: PMC3866579 DOI: 10.1186/1471-2164-14-885] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/27/2013] [Indexed: 01/01/2023] Open
Abstract
Background Through the diversity of cytokinin regulated processes, this phytohormone has a profound impact on plant growth and development. Cytokinin signaling is involved in the control of apical and lateral meristem activity, branching pattern of the shoot, and leaf senescence. These processes influence several traits, including the stem diameter, shoot architecture, and perennial life cycle, which define the development of woody plants. To facilitate research about the role of cytokinin in regulation of woody plant development, we have identified genes associated with cytokinin signaling and homeostasis pathways from two hardwood tree species. Results Taking advantage of the sequenced black cottonwood (Populus trichocarpa) and peach (Prunus persica) genomes, we have compiled a comprehensive list of genes involved in these pathways. We identified genes belonging to the six families of cytokinin oxidases (CKXs), isopentenyl transferases (IPTs), LONELY GUY genes (LOGs), two-component receptors, histidine containing phosphotransmitters (HPts), and response regulators (RRs). All together 85 Populus and 45 Prunus genes were identified, and compared to their Arabidopsis orthologs through phylogenetic analyses. Conclusions In general, when compared to Arabidopsis, differences in gene family structure were often seen in only one of the two tree species. However, one class of genes associated with cytokinin signal transduction, the CKI1-like family of two-component histidine kinases, was larger in both Populus and Prunus than in Arabidopsis.
Collapse
|
37
|
Co-expression analysis, proteomic and metabolomic study on the impact of a Deg/HtrA protease triple mutant in Synechocystis sp. PCC 6803 exposed to temperature and high light stress. J Proteomics 2013; 78:294-311. [DOI: 10.1016/j.jprot.2012.09.036] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Revised: 09/14/2012] [Accepted: 09/30/2012] [Indexed: 11/26/2022]
|
38
|
Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinformatics 2011; 12:390. [PMID: 21982277 PMCID: PMC3229535 DOI: 10.1186/1471-2105-12-390] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 10/07/2011] [Indexed: 11/29/2022] Open
Abstract
Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.
Collapse
|
39
|
A systems biology model of the regulatory network in Populus leaves reveals interacting regulators and conserved regulation. BMC PLANT BIOLOGY 2011; 11:13. [PMID: 21232107 PMCID: PMC3030533 DOI: 10.1186/1471-2229-11-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 01/13/2011] [Indexed: 05/23/2023]
Abstract
BACKGROUND Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species. RESULTS We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis. CONCLUSIONS We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis.
Collapse
|
40
|
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering. BMC Bioinformatics 2010; 11:503. [PMID: 20937082 PMCID: PMC3098084 DOI: 10.1186/1471-2105-11-503] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Accepted: 10/11/2010] [Indexed: 08/30/2023] Open
Abstract
BACKGROUND Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. RESULT We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. CONCLUSIONS The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.
Collapse
|
41
|
Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
42
|
A comprehensive analysis of the structure-function relationship in proteins based on local structure similarity. PLoS One 2009; 4:e6266. [PMID: 19603073 PMCID: PMC2705683 DOI: 10.1371/journal.pone.0006266] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 06/10/2009] [Indexed: 12/22/2022] Open
Abstract
Background Sequence similarity to characterized proteins provides testable functional hypotheses for less than 50% of the proteins identified by genome sequencing projects. With structural genomics it is believed that structural similarities may give functional hypotheses for many of the remaining proteins. Methodology/Principal Findings We provide a systematic analysis of the structure-function relationship in proteins using the novel concept of local descriptors of protein structure. A local descriptor is a small substructure of a protein which includes both short- and long-range interactions. We employ a library of commonly reoccurring local descriptors general enough to assemble most existing protein structures. We then model the relationship between these local shapes and Gene Ontology using rule-based learning. Our IF-THEN rule model offers legible, high resolution descriptions that combine local substructures and is able to discriminate functions even for functionally versatile folds such as the frequently occurring TIM barrel and Rossmann fold. By evaluating the predictive performance of the model, we provide a comprehensive quantification of the structure-function relationship based only on local structure similarity. Our findings are, among others, that conserved structure is a stronger prerequisite for enzymatic activity than for binding specificity, and that structure-based predictions complement sequence-based predictions. The model is capable of generating correct hypotheses, as confirmed by a literature study, even when no significant sequence similarity to characterized proteins exists. Conclusions/Significance Our approach offers a new and complete description and quantification of the structure-function relationship in proteins. By demonstrating how our predictions offer higher sensitivity than using global structure, and complement the use of sequence, we show that the presented ideas could advance the development of meta-servers in function prediction.
Collapse
|
43
|
Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. ACTA ACUST UNITED AC 2009; 25:1264-70. [PMID: 19289446 DOI: 10.1093/bioinformatics/btp149] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS We propose a novel hidden Markov model (HMM)-based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 x L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature. AVAILABILITY http://predictioncenter.org/Services/FragHMMent/.
Collapse
|
44
|
Gene expression trends and protein features effectively complement each other in gene function prediction. ACTA ACUST UNITED AC 2008; 25:322-30. [PMID: 19050035 DOI: 10.1093/bioinformatics/btn625] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Genome-scale 'omics' data constitute a potentially rich source of information about biological systems and their function. There is a plethora of tools and methods available to mine omics data. However, the diversity and complexity of different omics data types is a stumbling block for multi-data integration, hence there is a dire need for additional methods to exploit potential synergy from integrated orthogonal data. Rough Sets provide an efficient means to use complex information in classification approaches. Here, we set out to explore the possibilities of Rough Sets to incorporate diverse information sources in a functional classification of unknown genes. RESULTS We explored the use of Rough Sets for a novel data integration strategy where gene expression data, protein features and Gene Ontology (GO) annotations were combined to describe general and biologically relevant patterns represented by If-Then rules. The descriptive rules were used to predict the function of unknown genes in Arabidopsis thaliana and Schizosaccharomyces pombe. The If-Then rule models showed success rates of up to 0.89 (discriminative and predictive power for both modeled organisms); whereas, models built solely of one data type (protein features or gene expression data) yielded success rates varying from 0.68 to 0.78. Our models were applied to generate classifications for many unknown genes, of which a sizeable number were confirmed either by PubMed literature reports or electronically interfered annotations. Finally, we studied cell cycle protein-protein interactions derived from both tandem affinity purification experiments and in silico experiments in the BioGRID interactome database and found strong experimental evidence for the predictions generated by our models. The results show that our approach can be used to build very robust models that create synergy from integrating gene expression data and protein features. AVAILABILITY The Rough Set-based method is implemented in the Rosetta toolkit kernel version 1.0.1 available at: http://rosetta.lcb.uu.se/
Collapse
|
45
|
Interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space. J Chem Inf Model 2008; 48:2278-88. [PMID: 18937438 DOI: 10.1021/ci800200e] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Chemogenomics is a new strategy in in silico drug discovery, where the ultimate goal is to understand molecular recognition for all molecules interacting with all proteins in the proteome. To study such cross interactions, methods that can generalize over proteins that vary greatly in sequence, structure, and function are needed. We present a general quantitative approach to protein-ligand binding affinity prediction that spans the entire structural enzyme-ligand space. The model was trained on a data set composed of all available enzymes cocrystallized with druglike ligands, taken from four publicly available interaction databases, for which a crystal structure is available. Each enzyme was characterized by a set of local descriptors of protein structure that describe the binding site of the cocrystallized ligand. The ligands in the training set were described by traditional QSAR descriptors. To evaluate the model, a comprehensive test set consisting of enzyme structures and ligands was manually curated. The test set contained enzyme-ligand complexes for which no crystal structures were available, and thus the binding modes were unknown. The test set enzymes were therefore characterized by matching their entire structures to the local descriptor library constructed from the training set. Both the training and the test set contained enzyme-ligand complexes from all major enzyme classes, and the enzymes spanned a large range of sequences and folds. The experimental binding affinities (p K i) ranged from 0.5 to 11.9 (0.7-11.0 in the test set). The induced model predicted the binding affinities of the external test set enzyme-ligand complexes with an r (2) of 0.53 and an RMSEP of 1.5. This demonstrates that the use of local descriptors makes it possible to create rough predictive models that can generalize over a wide range of protein targets.
Collapse
|
46
|
Abstract
MOTIVATION Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. RESULTS We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.
Collapse
|
47
|
A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinformatics 2007; 19 Suppl 2:ii81-91. [PMID: 14534176 DOI: 10.1093/bioinformatics/btg1064] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative modeling methods can consistently produce reliable structural models for protein sequences with more than 25% sequence identity to proteins with known structure. However, there is a good chance that also sequences with lower sequence identity have their structural components represented in structural databases. To this end, we present a novel fragment-based method using sets of structurally similar local fragments of proteins. The approach differs from other fragment-based methods that use only single backbone fragments. Instead, we use a library of groups containing sets of sequence fragments with geometrically similar local structures and extract sequence related properties to assign these specific geometrical conformations to target sequences. We test the ability of the approach to recognize correct SCOP folds for 273 sequences from the 49 most popular folds. 49% of these sequences have the correct fold as their top prediction, while 82% have the correct fold in one of the top five predictions. Moreover, the approach shows no performance reduction on a subset of sequence targets with less than 10% sequence identity to any protein used to build the library.
Collapse
|
48
|
Differential gene expression in the olfactory bulb following exposure to the olfactory toxicant 2,6-dichlorophenyl methylsulphone and its 2,5-dichlorinated isomer in mice. Neurotoxicology 2007; 28:1120-8. [PMID: 17655932 DOI: 10.1016/j.neuro.2007.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Revised: 04/30/2007] [Accepted: 05/29/2007] [Indexed: 11/24/2022]
Abstract
2,6-Dichlorophenyl methylsulphone and a number of structurally related chemicals are CYP-activated toxicants in the olfactory mucosa in mice and rats. This toxicity involves both the olfactory neuroepithelium and its subepithelial nerves. In addition, 2,6-dichlorophenyl methylsulphone induces glial acidic fibrillary protein expression (Gfap, a biomarker for gliosis) in the olfactory bulb, as well as long-lasting learning deficits and changes in spontaneous behavior in mice and rats. So far the 2,5-dichlorinated isomer has not been reported to cause toxicity in the olfactory system, although it gives rise to transient changes in spontaneous behavior. In the present study we used 15k cDNA gene arrays and real-time RT-PCR to determine 2,6-dichlorophenyl methylsulphone-induced effects on gene expression in the olfactory bulb in mice. Seven days following a single ip dose of 2,6-dichlorophenyl methylsulphone, 56 genes were found to be differentially expressed in the olfactory bulb. Forty-one of these genes clustered into specific processes regulating, for instance, cell differentiation, cell migration and apoptosis. The genes selected for real-time RT-PCR were chosen to cover the range of B-values in the cDNA array analysis. Altered expression of Gfap, mt-Rnr2, Ncor1 and Olfml3 was confirmed. The expression of these genes was measured also in mice dosed with 2,5-dichlorophenyl methylsulphone, and mt-Rnr2 and Olfml3 were found to be altered also by this isomer. Combined with previous data, the results support the possibility that the persistent neurotoxicity induced by 2,6-dichlorophenyl methylsulphone in mice represents both an indirect and a direct effect on the brain. The 2,5-dichlorinated isomer, negative with regard to CYP-catalyzed toxicity in the olfactory mucosa, may prove useful to resolve this issue.
Collapse
|
49
|
Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures. Proteins 2006; 65:568-79. [PMID: 16948162 DOI: 10.1002/prot.21163] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Modeling and understanding protein-ligand interactions is one of the most important goals in computational drug discovery. To this end, proteochemometrics uses structural and chemical descriptors from several proteins and several ligands to induce interaction-models. Here, we present a new and generalized approach in which proteins varying greatly in terms of sequence and structure are represented by a library of local substructures. Using linear regression and rule-based learning, we combine such local substructures with chemical descriptors from the ligands to model binding affinity for a training set of hydrolase and lyase enzymes. We evaluate the predictive performance of these models using cross validation and sets of unseen ligand with unknown three-dimensional structure. The models are shown to generalize by outperforming models using descriptors from only proteins or only ligands, or models using global structure similarities rather than local similarities. Thus, we demonstrate that this approach is capable of describing dependencies between local structural properties and ligands in otherwise dissimilar protein structures. These dependencies are often, but not always, associated with local substructures that are in contact with the ligands. Finally, we show that strongly bound enzyme-ligand complexes require the presence of particular local substructures, while weakly bound complexes may be described by the absence of certain properties. The results demonstrate that the alignment-independent approach using local substructures is capable of describing protein-ligand interaction for largely different proteins and hence opens up for proteochemometrics-analysis of the interaction-space of entire proteomes. Current approaches are limited to families of closely related proteins. families of closely related proteins.
Collapse
|
50
|
Markers of adenocarcinoma characteristic of the site of origin: development of a diagnostic algorithm. Clin Cancer Res 2005; 11:3766-72. [PMID: 15897574 DOI: 10.1158/1078-0432.ccr-04-2236] [Citation(s) in RCA: 248] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
PURPOSE Patients with metastatic adenocarcinoma of unknown origin are a common clinical problem. Knowledge of the primary site is important for their management, but histologically, such tumors appear similar. Better diagnostic markers are needed to enable the assignment of metastases to likely sites of origin on pathologic samples. EXPERIMENTAL DESIGN Expression profiling of 27 candidate markers was done using tissue microarrays and immunohistochemistry. In the first (training) round, we studied 352 primary adenocarcinomas, from seven main sites (breast, colon, lung, ovary, pancreas, prostate and stomach) and their differential diagnoses. Data were analyzed in Microsoft Access and the Rosetta system, and used to develop a classification scheme. In the second (validation) round, we studied 100 primary adenocarcinomas and 30 paired metastases. RESULTS In the first round, we generated expression profiles for all 27 candidate markers in each of the seven main primary sites. Data analysis led to a simplified diagnostic panel and decision tree containing 10 markers only: CA125, CDX2, cytokeratins 7 and 20, estrogen receptor, gross cystic disease fluid protein 15, lysozyme, mesothelin, prostate-specific antigen, and thyroid transcription factor 1. Applying the panel and tree to the original data provided correct classification in 88%. The 10 markers and diagnostic algorithm were then tested in a second, independent, set of primary and metastatic tumors and again 88% were correctly classified. CONCLUSIONS This classification scheme should enable better prediction on biopsy material of the primary site in patients with metastatic adenocarcinoma of unknown origin, leading to improved management and therapy.
Collapse
|