101
|
Junker A, Rohn H, Schreiber F. Visual analysis of transcriptome data in the context of anatomical structures and biological networks. FRONTIERS IN PLANT SCIENCE 2012; 3:252. [PMID: 23162564 PMCID: PMC3498740 DOI: 10.3389/fpls.2012.00252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 10/22/2012] [Indexed: 05/12/2023]
Abstract
The complexity and temporal as well as spatial resolution of transcriptome datasets is constantly increasing due to extensive technological developments. Here we present methods for advanced visualization and intuitive exploration of transcriptomics data as necessary prerequisites in order to facilitate the gain of biological knowledge. Color-coding of structural images based on the expression level enables a fast visual data analysis in the background of the examined biological system. The network-based exploration of these visualizations allows for comparative analysis of genes with specific transcript patterns and supports the extraction of functional relationships even from large datasets. In order to illustrate the presented methods, the tool HIVE was applied for visualization and exploration of database-retrieved expression data for master regulators of Arabidopsis thaliana flower and seed development in the context of corresponding tissue-specific regulatory networks.
Collapse
Affiliation(s)
- Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
| | - Hendrik Rohn
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
| | - Falk Schreiber
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
- Institute of Computer Science, Martin Luther University Halle-WittenbergHalle, Germany
- Clayton School of Information Technology, Monash UniversityClayton, VIC, Australia
| |
Collapse
|
102
|
Aluru M, Zola J, Nettleton D, Aluru S. Reverse engineering and analysis of large genome-scale gene networks. Nucleic Acids Res 2012; 41:e24. [PMID: 23042249 PMCID: PMC3592423 DOI: 10.1093/nar/gks904] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Genetics, Iowa State University, Ames, IA 50011, USA.
| | | | | | | |
Collapse
|
103
|
Conserved non-coding regulatory signatures in Arabidopsis co-expressed gene modules. PLoS One 2012; 7:e45041. [PMID: 23024789 PMCID: PMC3443200 DOI: 10.1371/journal.pone.0045041] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 08/11/2012] [Indexed: 11/24/2022] Open
Abstract
Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.
Collapse
|
104
|
Han X, Chen C, Hyun TK, Kumar R, Kim JY. Metabolic module mining based on Independent Component Analysis in Arabidopsis thaliana. Mol Cells 2012; 34:295-304. [PMID: 22960738 PMCID: PMC3887838 DOI: 10.1007/s10059-012-0117-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Revised: 07/07/2012] [Accepted: 07/09/2012] [Indexed: 01/02/2023] Open
Abstract
Independent Component Analysis (ICA) has been introduced as one of the useful tools for gene-functional discovery in animals. However, this approach has been poorly utilized in the plant sciences. In the present study, we have exploited ICA combined with pathway enrichment analysis to address the statistical challenges associated with genome-wide analysis in plant system. To generate an Arabidopsis metabolic platform, we collected 4,373 Affy-metrix ATH1 microarray datasets. Out of the 3,232 metabolic genes and transcription factors, 99.47% of these genes were identified in at least one component, indicating the coverage of most of the metabolic pathways by the components. During the metabolic pathway enrichment analysis, we found components that indicate an independent regulation between the isoprenoid biosynthesis pathways. We also utilized this analysis tool to investigate some transcription factors involved in secondary cell wall biogenesis. This approach has identified remarkably more transcription factors compared to previously reported analysis tools. A website providing user-friendly searching and downloading of the entire dataset analyzed by ICA is available at http://kimjy.gnu.ac.kr/ICA.files/slide0002.htm . ICA combined with pathway enrichment analysis might provide a powerful approach for the extraction of the components responsible for a biological process of interest in plant systems.
Collapse
Affiliation(s)
- Xiao Han
- Division of Applied Life Science (Brain Korea 21-World Class University Program), Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju 660-701,
Korea
| | - Cong Chen
- Institute of Mitochondrial Biology and Medicine, The Key Laboratory of Biomedical Information Engineering of Ministry of Education, Xi’an Jiaotong University School of Life Science and Technology, Xi’an,
China
| | - Tae Kyung Hyun
- Division of Applied Life Science (Brain Korea 21-World Class University Program), Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju 660-701,
Korea
| | - Ritesh Kumar
- Division of Applied Life Science (Brain Korea 21-World Class University Program), Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju 660-701,
Korea
| | - Jae-Yean Kim
- Division of Applied Life Science (Brain Korea 21-World Class University Program), Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju 660-701,
Korea
| |
Collapse
|
105
|
Ingkasuwan P, Netrphan S, Prasitwattanaseree S, Tanticharoen M, Bhumiratana S, Meechai A, Chaijaruwanich J, Takahashi H, Cheevadhanarak S. Inferring transcriptional gene regulation network of starch metabolism in Arabidopsis thaliana leaves using graphical Gaussian model. BMC SYSTEMS BIOLOGY 2012; 6:100. [PMID: 22898356 PMCID: PMC3490714 DOI: 10.1186/1752-0509-6-100] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/20/2012] [Indexed: 01/22/2023]
Abstract
BACKGROUND Starch serves as a temporal storage of carbohydrates in plant leaves during day/night cycles. To study transcriptional regulatory modules of this dynamic metabolic process, we conducted gene regulation network analysis based on small-sample inference of graphical Gaussian model (GGM). RESULTS Time-series significant analysis was applied for Arabidopsis leaf transcriptome data to obtain a set of genes that are highly regulated under a diurnal cycle. A total of 1,480 diurnally regulated genes included 21 starch metabolic enzymes, 6 clock-associated genes, and 106 transcription factors (TF). A starch-clock-TF gene regulation network comprising 117 nodes and 266 edges was constructed by GGM from these 133 significant genes that are potentially related to the diurnal control of starch metabolism. From this network, we found that β-amylase 3 (b-amy3: At4g17090), which participates in starch degradation in chloroplast, is the most frequently connected gene (a hub gene). The robustness of gene-to-gene regulatory network was further analyzed by TF binding site prediction and by evaluating global co-expression of TFs and target starch metabolic enzymes. As a result, two TFs, indeterminate domain 5 (AtIDD5: At2g02070) and constans-like (COL: At2g21320), were identified as positive regulators of starch synthase 4 (SS4: At4g18240). The inference model of AtIDD5-dependent positive regulation of SS4 gene expression was experimentally supported by decreased SS4 mRNA accumulation in Atidd5 mutant plants during the light period of both short and long day conditions. COL was also shown to positively control SS4 mRNA accumulation. Furthermore, the knockout of AtIDD5 and COL led to deformation of chloroplast and its contained starch granules. This deformity also affected the number of starch granules per chloroplast, which increased significantly in both knockout mutant lines. CONCLUSIONS In this study, we utilized a systematic approach of microarray analysis to discover the transcriptional regulatory network of starch metabolism in Arabidopsis leaves. With this inference method, the starch regulatory network of Arabidopsis was found to be strongly associated with clock genes and TFs, of which AtIDD5 and COL were evidenced to control SS4 gene expression and starch granule formation in chloroplasts.
Collapse
Affiliation(s)
- Papapit Ingkasuwan
- School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, 10140, Thailand
| | | | | | | | | | | | | | | | | |
Collapse
|
106
|
Heyndrickx KS, Vandepoele K. Systematic identification of functional plant modules through the integration of complementary data sources. PLANT PHYSIOLOGY 2012; 159:884-901. [PMID: 22589469 PMCID: PMC3387714 DOI: 10.1104/pp.112.196725] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation.
Collapse
|
107
|
Zhang L, Yu S, Zuo K, Luo L, Tang K. Identification of gene modules associated with drought response in rice by network-based analysis. PLoS One 2012; 7:e33748. [PMID: 22662107 PMCID: PMC3360736 DOI: 10.1371/journal.pone.0033748] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Accepted: 02/17/2012] [Indexed: 12/11/2022] Open
Abstract
Understanding the molecular mechanisms that underlie plant responses to drought stress is challenging due to the complex interplay of numerous different genes. Here, we used network-based gene clustering to uncover the relationships between drought-responsive genes from large microarray datasets. We identified 2,607 rice genes that showed significant changes in gene expression under drought stress; 1,392 genes were highly intercorrelated to form 15 gene modules. These drought-responsive gene modules are biologically plausible, with enrichments for genes in common functional categories, stress response changes, tissue-specific expression and transcription factor binding sites. We observed that a gene module (referred to as module 4) consisting of 134 genes was significantly associated with drought response in both drought-tolerant and drought-sensitive rice varieties. This module is enriched for genes involved in controlling the response of the plant to water and embryonic development, including a heat shock transcription factor as the key regulator in the expression of ABRE-containing genes. These results suggest that module 4 is highly conserved in the ABA-mediated drought response pathway in different rice varieties. Moreover, our study showed that many hub genes clustered in rice chromosomes had significant associations with QTLs for drought stress tolerance. The relationship between hub gene clusters and drought tolerance QTLs may provide a key to understand the genetic basis of drought tolerance in rice.
Collapse
Affiliation(s)
- Lida Zhang
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Shunwu Yu
- Shanghai Agrobiological Gene Center, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Kaijing Zuo
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Lijun Luo
- Shanghai Agrobiological Gene Center, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Kexuan Tang
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail:
| |
Collapse
|
108
|
Feng Y, Hurst J, Almeida-De-Macedo M, Chen X, Li L, Ransom N, Wurtele ES. Massive human co-expression network and its medical applications. Chem Biodivers 2012; 9:868-87. [PMID: 22589089 PMCID: PMC3711686 DOI: 10.1002/cbdv.201100355] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Network-based analysis is indispensable in analyzing high-throughput biological data. Based on the assumption that the variation of gene interactions under given biological conditions could be better interpreted in the context of a large-scale and wide variety of developmental, tissue, and disease conditions, we leverage the large quantity of publicly available transcriptomic data >40,000 HG U133A Affymetrix microarray chips stored in ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) using MetaOmGraph (http://metnet.vrac.iastate.edu/MetNet_MetaOmGraph.htm). From this data, 18,637 chips encompassing over 500 experiments containing high-quality data (18637 Hu-dataset) were used to create a globally stable gene co-expression network (18637 Hu-co-expression-network). Regulons, groups of highly and consistently co-expressed genes, were obtained by partitioning the 18637 Hu-co-expression-network using an Markov clustering algorithm (MCL). The regulons were demonstrated to be statistically significant using a gene ontology (GO) term overrepresentation test combined with evaluation of the effects of gene permutations. The regulons include ca. 12% of human genes, interconnected by 31,471 correlations. All network data and metadata are publically available (http://metnet.vrac.iastate.edu/MetNet_MetaOmGraph.htm). Text mining of these metadata, GO term overrepresentation analysis, and statistical analysis of transcriptomic experiments across multiple environmental, tissue, and disease conditions, has revealed novel fingerprints distinguishing central nervous system (CNS)-related conditions. This study demonstrates the value of mega-scale network-based analysis for biologists to further refine transcriptomic data, derived from a particular condition, to study the global relationships between genes and diseases, and to develop hypotheses that can inform future research.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Jonathan Hurst
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Marcia Almeida-De-Macedo
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Xi Chen
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Ling Li
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Nick Ransom
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| | - Eve Syrkin Wurtele
- Department of Genetics, Development, and Cell Biology, Program of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA, phone: +01-515-294 8989; fax: +01-515-294 1337
| |
Collapse
|
109
|
Fukushima A, Nishizawa T, Hayakumo M, Hikosaka S, Saito K, Goto E, Kusano M. Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches. PLANT PHYSIOLOGY 2012; 158:1487-502. [PMID: 22307966 PMCID: PMC3343727 DOI: 10.1104/pp.111.188367] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 01/31/2012] [Indexed: 05/20/2023]
Abstract
Gene-to-gene coexpression analysis provides fundamental information and is a promising approach for predicting unknown gene functions in plants. We investigated various associations in the gene expression of tomato (Solanum lycopersicum) to predict unknown gene functions in an unbiased manner. We obtained more than 300 microarrays from publicly available databases and our own hybridizations, and here, we present tomato coexpression networks and coexpression modules. The topological characteristics of the networks were highly heterogenous. We extracted 465 total coexpression modules from the data set by graph clustering, which allows users to divide a graph effectively into a set of clusters. Of these, 88% were assigned systematically by Gene Ontology terms. Our approaches revealed functional modules in the tomato transcriptome data; the predominant functions of coexpression modules were biologically relevant. We also investigated differential coexpression among data sets consisting of leaf, fruit, and root samples to gain further insights into the tomato transcriptome. We now demonstrate that (1) duplicated genes, as well as metabolic genes, exhibit a small but significant number of differential coexpressions, and (2) a reversal of gene coexpression occurred in two metabolic pathways involved in lycopene and flavonoid biosynthesis. Independent experimental verification of the findings for six selected genes was done using quantitative real-time polymerase chain reaction. Our findings suggest that differential coexpression may assist in the investigation of key regulatory steps in metabolic pathways. The approaches and results reported here will be useful to prioritize candidate genes for further functional genomics studies of tomato metabolism.
Collapse
|
110
|
Van Hemert JL, Dickerson JA. Discriminating response groups in metabolic and regulatory pathway networks. Bioinformatics 2012; 28:947-54. [DOI: 10.1093/bioinformatics/bts039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
111
|
Allen JD, Xie Y, Chen M, Girard L, Xiao G. Comparing statistical methods for constructing large scale gene networks. PLoS One 2012; 7:e29348. [PMID: 22272232 PMCID: PMC3260142 DOI: 10.1371/journal.pone.0029348] [Citation(s) in RCA: 147] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 11/25/2011] [Indexed: 12/14/2022] Open
Abstract
The gene regulatory network (GRN) reveals the regulatory relationships among genes and can provide a systematic understanding of molecular mechanisms underlying biological processes. The importance of computer simulations in understanding cellular processes is now widely accepted; a variety of algorithms have been developed to study these biological networks. The goal of this study is to provide a comprehensive evaluation and a practical guide to aid in choosing statistical methods for constructing large scale GRNs. Using both simulation studies and a real application in E. coli data, we compare different methods in terms of sensitivity and specificity in identifying the true connections and the hub genes, the ease of use, and computational speed. Our results show that these algorithms performed reasonably well, and each method has its own advantages: (1) GeneNet, WGCNA (Weighted Correlation Network Analysis), and ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) performed well in constructing the global network structure; (2) GeneNet and SPACE (Sparse PArtial Correlation Estimation) performed well in identifying a few connections with high specificity.
Collapse
Affiliation(s)
- Jeffrey D Allen
- University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | | | | | | | | |
Collapse
|
112
|
Tohge T, Fernie AR. Co-expression and co-responses: within and beyond transcription. FRONTIERS IN PLANT SCIENCE 2012; 3:248. [PMID: 23162560 PMCID: PMC3492870 DOI: 10.3389/fpls.2012.00248] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 10/20/2012] [Indexed: 05/04/2023]
Abstract
Whole genome sequencing, the relative ease of transcript profiling by the use of microarrays and latterly RNA sequencing approaches have facilitated the capture of vast amounts of transcript data. However, despite the enormous progress made in gene annotation a substantial proportion of genes remain to be annotated at the functional level. Considerable progress has, however, been made by searching for transcriptional coordination between genes of known function and non-annotated genes on the premise that such co-expressed genes tend to be functionally related. Here we review progress made following this approach as well as its expansion to include phenotypic information from other levels of cellular organization such as proteomic and metabolomic data as well as physiological and developmental phenotypes.
Collapse
Affiliation(s)
- Takayuki Tohge
- *Correspondence: Takayuki Tohge, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany. e-mail:
| | | |
Collapse
|
113
|
Ruprecht C, Persson S. Co-expression of cell-wall related genes: new tools and insights. FRONTIERS IN PLANT SCIENCE 2012; 3:83. [PMID: 22645599 PMCID: PMC3355730 DOI: 10.3389/fpls.2012.00083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Accepted: 04/13/2012] [Indexed: 05/02/2023]
Abstract
Global transcript analyses based on publicly available microarray dataset have revealed that genes with similar function tend to be transcriptionally coordinated. Indeed, many genes involved in the formation of cellulose, hemicelluloses, and lignin have been identified using co-expression approaches in Arabidopsis. To facilitate these transcript analyses, several web-based tools have been developed that allow researchers to investigate co-expression relationships of their gene(s) of interest. In addition, several tools now also provide the possibility of comparative transcriptional analyses across species, which potentially increases the predictive power. In this short review, we describe recent developments and updates of plant-related co-expression tools, and summarize studies that have successfully used expression profiling in cell wall research. Finally, we illustrate the value of comparative co-expression relationships across species using genes involved in lignin biosynthesis.
Collapse
Affiliation(s)
| | - Staffan Persson
- *Correspondence: Staffan Persson, Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany. e-mail:
| |
Collapse
|
114
|
Ruan J, Perez J, Hernandez B, Lei C, Sunter G, Sponsel VM. Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana. BMC Bioinformatics 2011; 12 Suppl 12:S2. [PMID: 22168340 PMCID: PMC3247083 DOI: 10.1186/1471-2105-12-s12-s2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Several large-scale gene co-expression networks have been constructed successfully for predicting gene functional modules and cis-regulatory elements in Arabidopsis (Arabidopsis thaliana). However, these networks are usually constructed and analyzed in an ad hoc manner. In this study, we propose a completely parameter-free and systematic method for constructing gene co-expression networks and predicting functional modules as well as cis-regulatory elements. Results Our novel method consists of an automated network construction algorithm, a parameter-free procedure to predict functional modules, and a strategy for finding known cis-regulatory elements that is suitable for consensus scanning without prior knowledge of the allowed extent of degeneracy of the motif. We apply the method to study a large collection of gene expression microarray data in Arabidopsis. We estimate that our co-expression network has ~94% of accuracy, and has topological properties similar to other biological networks, such as being scale-free and having a high clustering coefficient. Remarkably, among the ~300 predicted modules whose sizes are at least 20, 88% have at least one significantly enriched functions, including a few extremely significant ones (ribosome, p < 1E-300, photosynthetic membrane, p < 1.3E-137, proteasome complex, p < 5.9E-126). In addition, we are able to predict cis-regulatory elements for 66.7% of the modules, and the association between the enriched cis-regulatory elements and the enriched functional terms can often be confirmed by the literature. Overall, our results are much more significant than those reported by several previous studies on similar data sets. Finally, we utilize the co-expression network to dissect the promoters of 19 Arabidopsis genes involved in the metabolism and signaling of the important plant hormone gibberellin, and achieved promising results that reveal interesting insight into the biosynthesis and signaling of gibberellin. Conclusions The results show that our method is highly effective in finding functional modules from real microarray data. Our application on Arabidopsis leads to the discovery of the largest number of annotated Arabidopsis functional modules in the literature. Given the high statistical significance of functional enrichment and the agreement between cis-regulatory and functional annotations, we believe our Arabidopsis gene modules can be used to predict the functions of unknown genes in Arabidopsis, and to understand the regulatory mechanisms of many genes.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, Texas 78249, USA.
| | | | | | | | | | | |
Collapse
|
115
|
Cramer GR, Urano K, Delrot S, Pezzotti M, Shinozaki K. Effects of abiotic stress on plants: a systems biology perspective. BMC PLANT BIOLOGY 2011; 11:163. [PMID: 22094046 PMCID: PMC3252258 DOI: 10.1186/1471-2229-11-163] [Citation(s) in RCA: 539] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 11/17/2011] [Indexed: 05/18/2023]
Abstract
The natural environment for plants is composed of a complex set of abiotic stresses and biotic stresses. Plant responses to these stresses are equally complex. Systems biology approaches facilitate a multi-targeted approach by allowing one to identify regulatory hubs in complex networks. Systems biology takes the molecular parts (transcripts, proteins and metabolites) of an organism and attempts to fit them into functional networks or models designed to describe and predict the dynamic activities of that organism in different environments. In this review, research progress in plant responses to abiotic stresses is summarized from the physiological level to the molecular level. New insights obtained from the integration of omics datasets are highlighted. Gaps in our knowledge are identified, providing additional focus areas for crop improvement research in the future.
Collapse
Affiliation(s)
- Grant R Cramer
- Department of Biochemistry and Molecular Biology, Mail Stop 330, University of Nevada, Reno, Nevada 89557, USA
| | - Kaoru Urano
- Gene Discovery Research Group, RIKEN Plant Science Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan
| | - Serge Delrot
- Univ. Bordeaux, ISVV, Ecophysiologie et Génomique Fonctionnelle de la Vigne, UMR 1287, F-33882 Villenave d'Ornon, France
| | - Mario Pezzotti
- Dipartimento di Biotecnologie, Università di Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Kazuo Shinozaki
- Gene Discovery Research Group, RIKEN Plant Science Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan
| |
Collapse
|
116
|
Inequalities and duality in gene coexpression networks of HIV-1 infection revealed by the combination of the double-connectivity approach and the Gini's method. J Biomed Biotechnol 2011; 2011:926407. [PMID: 21976970 PMCID: PMC3184446 DOI: 10.1155/2011/926407] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 07/13/2011] [Indexed: 11/17/2022] Open
Abstract
The symbiosis (Sym) and pathogenesis (Pat) is a duality problem of microbial infection, including HIV/AIDS. Statistical analysis of inequalities and duality in gene coexpression networks (GCNs) of HIV-1 infection may gain novel insights into AIDS. In this study, we focused on analysis of GCNs of uninfected subjects and HIV-1-infected patients at three different stages of viral infection based on data deposited in the GEO database of NCBI. The inequalities and duality in these GCNs were analyzed by the combination of the double-connectivity (DC) approach and the Gini's method. DC analysis reveals that there are significant differences between positive and negative connectivity in HIV-1 stage-specific GCNs. The inequality measures of negative connectivity and edge weight are changed more significantly than those of positive connectivity and edge weight in GCNs from the HIV-1 uninfected to the AIDS stages. With the permutation test method, we identified a set of genes with significant changes in the inequality and duality measure of edge weight. Functional analysis shows that these genes are highly enriched for the immune system, which plays an essential role in the Sym-Pat duality (SPD) of microbial infections. Understanding of the SPD problems of HIV-1 infection may provide novel intervention strategies for AIDS.
Collapse
|
117
|
Childs KL, Davidson RM, Buell CR. Gene coexpression network analysis as a source of functional annotation for rice genes. PLoS One 2011; 6:e22196. [PMID: 21799793 PMCID: PMC3142134 DOI: 10.1371/journal.pone.0022196] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2011] [Accepted: 06/20/2011] [Indexed: 11/26/2022] Open
Abstract
With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa) gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional annotation of those modules. Additionally, the expression patterns of genes across the treatments/conditions of an expression experiment comprise a second form of useful annotation.
Collapse
Affiliation(s)
- Kevin L Childs
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America.
| | | | | |
Collapse
|
118
|
Hsu JT, Peng CH, Hsieh WP, Lan CY, Tang CY. A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle. BMC Bioinformatics 2011; 12:281. [PMID: 21749690 PMCID: PMC3143111 DOI: 10.1186/1471-2105-12-281] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 07/12/2011] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of approaches have been used to predict gene functions and interactions, tools that analyze the essential coordination of functional components in cellular processes still need to be developed. RESULTS In this work, we present a new approach to study the cooperation of functional modules (sets of functionally related genes) in a specific cellular process. A cooperative module pair is defined as two modules that significantly cooperate with certain functional genes in a cellular process. This method identifies cooperative module pairs that significantly influence a cellular process and the correlated genes and interactions that are essential to that process. Using the yeast cell cycle as an example, we identified 101 cooperative module associations among 82 modules, and importantly, we established a cell cycle-specific cooperative module network. Most of the identified module pairs cover cooperative pathways and components essential to the cell cycle. We found that 14, 36, 18, 15, and 20 cooperative module pairs significantly cooperate with genes regulated in early G1, late G1, S, G2, and M phase, respectively. Fifty-nine module pairs that correlate with Cdc28 and other essential regulators were also identified. These results are consistent with previous studies and demonstrate that our methodology is effective for studying cooperative mechanisms in the cell cycle. CONCLUSIONS In this work, we propose a new approach to identifying condition-related cooperative interactions, and importantly, we establish a cell cycle-specific cooperation module network. These results provide a global view of the cell cycle and the method can be used to discover the dynamic coordination properties of functional components in other cellular processes.
Collapse
Affiliation(s)
- Jeh-Ting Hsu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | | | | | | | |
Collapse
|
119
|
Ficklin SP, Feltus FA. Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice. PLANT PHYSIOLOGY 2011; 156:1244-56. [PMID: 21606319 PMCID: PMC3135956 DOI: 10.1104/pp.111.173047] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2011] [Accepted: 05/20/2011] [Indexed: 05/17/2023]
Abstract
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.
Collapse
|
120
|
Li W, Liu CC, Zhang T, Li H, Waterman MS, Zhou XJ. Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol 2011; 7:e1001106. [PMID: 21698123 PMCID: PMC3116899 DOI: 10.1371/journal.pcbi.1001106] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 02/08/2011] [Indexed: 11/18/2022] Open
Abstract
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks.
Collapse
Affiliation(s)
- Wenyuan Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Chun-Chi Liu
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Tong Zhang
- Department of Statistics, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Haifeng Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Michael S. Waterman
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Xianghong Jasmine Zhou
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
121
|
Xiong J, Yuan D, Fillingham JS, Garg J, Lu X, Chang Y, Liu Y, Fu C, Pearlman RE, Miao W. Gene network landscape of the ciliate Tetrahymena thermophila. PLoS One 2011; 6:e20124. [PMID: 21637855 PMCID: PMC3102692 DOI: 10.1371/journal.pone.0020124] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 04/13/2011] [Indexed: 01/03/2023] Open
Abstract
Background Genome-wide expression data of gene microarrays can be used to infer gene networks. At a cellular level, a gene network provides a picture of the modules in which genes are densely connected, and of the hub genes, which are highly connected with other genes. A gene network is useful to identify the genes involved in the same pathway, in a protein complex or that are co-regulated. In this study, we used different methods to find gene networks in the ciliate Tetrahymena thermophila, and describe some important properties of this network, such as modules and hubs. Methodology/Principal Findings Using 67 single channel microarrays, we constructed the Tetrahymena gene network (TGN) using three methods: the Pearson correlation coefficient (PCC), the Spearman correlation coefficient (SCC) and the context likelihood of relatedness (CLR) algorithm. The accuracy and coverage of the three networks were evaluated using four conserved protein complexes in yeast. The CLR network with a Z-score threshold 3.49 was determined to be the most robust. The TGN was partitioned, and 55 modules were found. In addition, analysis of the arbitrarily determined 1200 hubs showed that these hubs could be sorted into six groups according to their expression profiles. We also investigated human disease orthologs in Tetrahymena that are missing in yeast and provide evidence indicating that some of these are involved in the same process in Tetrahymena as in human. Conclusions/Significance This study constructed a Tetrahymena gene network, provided new insights to the properties of this biological network, and presents an important resource to study Tetrahymena genes at the pathway level.
Collapse
Affiliation(s)
- Jie Xiong
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- Graduate School of Chinese Academy of Sciences, Beijing, China
| | - Dongxia Yuan
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | | | - Jyoti Garg
- Department of Biology and Center for Research in Mass Spectrometry, York University, Toronto, Ontario, Canada
| | - Xingyi Lu
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- Graduate School of Chinese Academy of Sciences, Beijing, China
| | - Yue Chang
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- Graduate School of Chinese Academy of Sciences, Beijing, China
| | - Yifan Liu
- Pathology Department, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Chengjie Fu
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Ronald E. Pearlman
- Department of Biology and Center for Research in Mass Spectrometry, York University, Toronto, Ontario, Canada
| | - Wei Miao
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- * E-mail:
| |
Collapse
|
122
|
Lysenko A, Defoin-Platel M, Hassani-Pak K, Taubert J, Hodgman C, Rawlings CJ, Saqi M. Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis. BMC Bioinformatics 2011; 12:203. [PMID: 21612636 PMCID: PMC3118170 DOI: 10.1186/1471-2105-12-203] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Accepted: 05/25/2011] [Indexed: 12/18/2022] Open
Abstract
Background Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. Results We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in Arabidopsis thaliana. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. Conclusions Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.
Collapse
Affiliation(s)
- Artem Lysenko
- Centre for Mathematical and Computational Biology, Rothamsted Research, Harpenden, Herts, AL5, 2JQ, UK.
| | | | | | | | | | | | | |
Collapse
|
123
|
Lorenz WW, Alba R, Yu YS, Bordeaux JM, Simões M, Dean JFD. Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of loblolly pine (P. taeda L.). BMC Genomics 2011; 12:264. [PMID: 21609476 PMCID: PMC3123330 DOI: 10.1186/1471-2164-12-264] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/24/2011] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Global transcriptional analysis of loblolly pine (Pinus taeda L.) is challenging due to limited molecular tools. PtGen2, a 26,496 feature cDNA microarray, was fabricated and used to assess drought-induced gene expression in loblolly pine propagule roots. Statistical analysis of differential expression and weighted gene correlation network analysis were used to identify drought-responsive genes and further characterize the molecular basis of drought tolerance in loblolly pine. RESULTS Microarrays were used to interrogate root cDNA populations obtained from 12 genotype × treatment combinations (four genotypes, three watering regimes). Comparison of drought-stressed roots with roots from the control treatment identified 2445 genes displaying at least a 1.5-fold expression difference (false discovery rate = 0.01). Genes commonly associated with drought response in pine and other plant species, as well as a number of abiotic and biotic stress-related genes, were up-regulated in drought-stressed roots. Only 76 genes were identified as differentially expressed in drought-recovered roots, indicating that the transcript population can return to the pre-drought state within 48 hours. Gene correlation analysis predicts a scale-free network topology and identifies eleven co-expression modules that ranged in size from 34 to 938 members. Network topological parameters identified a number of central nodes (hubs) including those with significant homology (E-values ≤ 2 × 10-30) to 9-cis-epoxycarotenoid dioxygenase, zeatin O-glucosyltransferase, and ABA-responsive protein. Identified hubs also include genes that have been associated previously with osmotic stress, phytohormones, enzymes that detoxify reactive oxygen species, and several genes of unknown function. CONCLUSION PtGen2 was used to evaluate transcriptome responses in loblolly pine and was leveraged to identify 2445 differentially expressed genes responding to severe drought stress in roots. Many of the genes identified are known to be up-regulated in response to osmotic stress in pine and other plant species and encode proteins involved in both signal transduction and stress tolerance. Gene expression levels returned to control values within a 48-hour recovery period in all but 76 transcripts. Correlation network analysis indicates a scale-free network topology for the pine root transcriptome and identifies central nodes that may serve as drivers of drought-responsive transcriptome dynamics in the roots of loblolly pine.
Collapse
Affiliation(s)
- W Walter Lorenz
- Warnell School of Forestry and Natural Resources, The University of Georgia, Athens, GA 30602, USA
| | - Rob Alba
- Monsanto Company, Mailstop C1N, 800 N. Lindbergh Blvd., St. Louis, MO 63167, USA
| | - Yuan-Sheng Yu
- Warnell School of Forestry and Natural Resources, The University of Georgia, Athens, GA 30602, USA
| | - John M Bordeaux
- Warnell School of Forestry and Natural Resources, The University of Georgia, Athens, GA 30602, USA
| | - Marta Simões
- Instituto de Biologia Experimental e Tecnológica (IBET)/Instituto de Tecnologia Química e Biológica-Universidade Nova de Lisboa (ITQB-UNL), Av. República (EAN) 2784-505 Oeiras, Portugal
| | - Jeffrey FD Dean
- Warnell School of Forestry and Natural Resources, The University of Georgia, Athens, GA 30602, USA
- Department of Biochemistry & Molecular Biology, The University of Georgia, Life Sciences Building, Athens, GA 30602, USA
| |
Collapse
|
124
|
Eguíluz VM, Pérez T, Borge-Holthoefer J, Arenas A. Structural and functional networks in complex systems with delay. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:056113. [PMID: 21728611 DOI: 10.1103/physreve.83.056113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Revised: 02/16/2011] [Indexed: 05/31/2023]
Abstract
Functional networks of complex systems are obtained from the analysis of the temporal activity of their components, and are often used to infer their unknown underlying connectivity. We obtain the equations relating topology and function in a system of diffusively delay-coupled elements in complex networks. We solve exactly the resulting equations in motifs (directed structures of three nodes) and in directed networks. The mean-field solution for directed uncorrelated networks shows that the clusterization of the activity is dominated by the in-degree of the nodes, and that the locking frequency decreases with increasing average degree. We find that the exponent of a power law degree distribution of the structural topology γ is related to the exponent of the associated functional network as α=(2-γ)(-1) for γ<2.
Collapse
Affiliation(s)
- Víctor M Eguíluz
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), E-07122 Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
125
|
Zheng X, Liu T, Yang Z, Wang J. Large cliques in Arabidopsis gene coexpression network and motif discovery. JOURNAL OF PLANT PHYSIOLOGY 2011; 168:611-618. [PMID: 21044807 DOI: 10.1016/j.jplph.2010.09.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Revised: 08/31/2010] [Accepted: 09/06/2010] [Indexed: 05/30/2023]
Abstract
Identification of cis-regulatory elements in Arabidopsis is a key step to understanding its transcriptional regulation scheme. In this study, the Arabidopsis gene coexpression network was constructed using the ATTED-II data, and thereafter a subgraph-induced approach and clique-finding algorithm were used to extract gene coexpression groups from the gene coexpression network. A total of 23 large coexpression gene groups were obtained, with each consisting of more than 100 highly correlated genes. Four classical tools were used to predict motifs in the promoter regions of coexpressed genes. Consequently, we detected a large number of candidate biologically relevant regulatory elements, and many of them are consistent with known cis-regulatory elements from AGRIS and AthaMap. Experiments on coexpressed groups, including E2Fa target genes, showed that our method had a high probability of returning the real binding motif. Our study provides the basis for future cis-regulatory module analysis and creates a starting point to unravel regulatory networks of Arabidopsis thaliana.
Collapse
Affiliation(s)
- Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | | | | | | |
Collapse
|
126
|
Mutwil M, Klie S, Tohge T, Giorgi FM, Wilkins O, Campbell MM, Fernie AR, Usadel B, Nikoloski Z, Persson S. PlaNet: combined sequence and expression comparisons across plant networks derived from seven species. THE PLANT CELL 2011; 23:895-910. [PMID: 21441431 PMCID: PMC3082271 DOI: 10.1105/tpc.111.083667] [Citation(s) in RCA: 144] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 01/26/2011] [Accepted: 03/07/2011] [Indexed: 05/17/2023]
Abstract
The model organism Arabidopsis thaliana is readily used in basic research due to resource availability and relative speed of data acquisition. A major goal is to transfer acquired knowledge from Arabidopsis to crop species. However, the identification of functional equivalents of well-characterized Arabidopsis genes in other plants is a nontrivial task. It is well documented that transcriptionally coordinated genes tend to be functionally related and that such relationships may be conserved across different species and even kingdoms. To exploit such relationships, we constructed whole-genome coexpression networks for Arabidopsis and six important plant crop species. The interactive networks, clustered using the HCCA algorithm, are provided under the banner PlaNet (http://aranet.mpimp-golm.mpg.de). We implemented a comparative network algorithm that estimates similarities between network structures. Thus, the platform can be used to swiftly infer similar coexpressed network vicinities within and across species and can predict the identity of functional homologs. We exemplify this using the PSA-D and chalcone synthase-related gene networks. Finally, we assessed how ontology terms are transcriptionally connected in the seven species and provide the corresponding MapMan term coexpression networks. The data support the contention that this platform will considerably improve transfer of knowledge generated in Arabidopsis to valuable crop species.
Collapse
Affiliation(s)
- Marek Mutwil
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Sebastian Klie
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Takayuki Tohge
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Federico M. Giorgi
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Olivia Wilkins
- Centre for the Analysis of Genome Evolution and Function, Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Malcolm M. Campbell
- Centre for the Analysis of Genome Evolution and Function, Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
- Department of Biology, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Alisdair R. Fernie
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Björn Usadel
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Zoran Nikoloski
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| | - Staffan Persson
- Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam, Germany
| |
Collapse
|
127
|
Fukushima A, Kusano M, Redestig H, Arita M, Saito K. Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC SYSTEMS BIOLOGY 2011; 5:1. [PMID: 21194489 PMCID: PMC3030539 DOI: 10.1186/1752-0509-5-1] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 01/01/2011] [Indexed: 02/07/2023]
Abstract
BACKGROUND Deciphering the metabolome is essential for a better understanding of the cellular metabolism as a system. Typical metabolomics data show a few but significant correlations among metabolite levels when data sampling is repeated across individuals grown under strictly controlled conditions. Although several studies have assessed topologies in metabolomic correlation networks, it remains unclear whether highly connected metabolites in these networks have specific functions in known tissue- and/or genotype-dependent biochemical pathways. RESULTS In our study of metabolite profiles we subjected root tissues to gas chromatography-time-of-flight/mass spectrometry (GC-TOF/MS) and used published information on the aerial parts of 3 Arabidopsis genotypes, Col-0 wild-type, methionine over-accumulation 1 (mto1), and transparent testa4 (tt4) to compare systematically the metabolomic correlations in samples of roots and aerial parts. We then applied graph clustering to the constructed correlation networks to extract densely connected metabolites and evaluated the clusters by biochemical-pathway enrichment analysis. We found that the number of significant correlations varied by tissue and genotype and that the obtained clusters were significantly enriched for metabolites included in biochemical pathways. CONCLUSIONS We demonstrate that the graph-clustering approach identifies tissue- and/or genotype-dependent metabolomic clusters related to the biochemical pathway. Metabolomic correlations complement information about changes in mean metabolite levels and may help to elucidate the organization of metabolically functional modules.
Collapse
|
128
|
Ferrier T, Matus JT, Jin J, Riechmann JL. Arabidopsis paves the way: genomic and network analyses in crops. Curr Opin Biotechnol 2010; 22:260-70. [PMID: 21167706 DOI: 10.1016/j.copbio.2010.11.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Revised: 11/19/2010] [Accepted: 11/22/2010] [Indexed: 01/08/2023]
Abstract
Arabidopsis genomic and network analyses have facilitated crop research towards the understanding of many biological processes of fundamental importance for agriculture. Genes that were identified through genomic analyses in Arabidopsis have been used to manipulate crop traits such as pathogen resistance, yield, water-use efficiency, and drought tolerance, with the effects being tested in field conditions. The integration of diverse Arabidopsis genome-wide datasets in probabilistic functional networks has been demonstrated as a feasible strategy to associate novel genes with traits of interest, and novel genomic methods continue to be developed. The combination of genome-wide location studies, using ChIP-Seq, with gene expression profiling data is affording a genome-wide view of regulatory networks previously delineated through genetic and molecular analyses, leading to the identification of novel components and of new connections within these networks.
Collapse
Affiliation(s)
- Thilia Ferrier
- Center for Research in Agricultural Genomics CSIC-IRTA-UAB, Barcelona 08034, Spain
| | | | | | | |
Collapse
|
129
|
Ficklin SP, Luo F, Feltus FA. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks. PLANT PHYSIOLOGY 2010; 154:13-24. [PMID: 20668062 PMCID: PMC2938148 DOI: 10.1104/pp.110.159459] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 07/21/2010] [Indexed: 05/18/2023]
Abstract
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Collapse
|
130
|
svdPPCS: an effective singular value decomposition-based method for conserved and divergent co-expression gene module identification. BMC Bioinformatics 2010; 11:338. [PMID: 20565989 PMCID: PMC2905369 DOI: 10.1186/1471-2105-11-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 06/22/2010] [Indexed: 12/25/2022] Open
Abstract
Background Comparative analysis of gene expression profiling of multiple biological categories, such as different species of organisms or different kinds of tissue, promises to enhance the fundamental understanding of the universality as well as the specialization of mechanisms and related biological themes. Grouping genes with a similar expression pattern or exhibiting co-expression together is a starting point in understanding and analyzing gene expression data. In recent literature, gene module level analysis is advocated in order to understand biological network design and system behaviors in disease and life processes; however, practical difficulties often lie in the implementation of existing methods. Results Using the singular value decomposition (SVD) technique, we developed a new computational tool, named svdPPCS (SVD-based Pattern Pairing and Chart Splitting), to identify conserved and divergent co-expression modules of two sets of microarray experiments. In the proposed methods, gene modules are identified by splitting the two-way chart coordinated with a pair of left singular vectors factorized from the gene expression matrices of the two biological categories. Importantly, the cutoffs are determined by a data-driven algorithm using the well-defined statistic, SVD-p. The implementation was illustrated on two time series microarray data sets generated from the samples of accessory gland (ACG) and malpighian tubule (MT) tissues of the line W118 of M. drosophila. Two conserved modules and six divergent modules, each of which has a unique characteristic profile across tissue kinds and aging processes, were identified. The number of genes contained in these models ranged from five to a few hundred. Three to over a hundred GO terms were over-represented in individual modules with FDR < 0.1. One divergent module suggested the tissue-specific relationship between the expressions of mitochondrion-related genes and the aging process. This finding, together with others, may be of biological significance. The validity of the proposed SVD-based method was further verified by a simulation study, as well as the comparisons with regression analysis and cubic spline regression analysis plus PAM based clustering. Conclusions svdPPCS is a novel computational tool for the comparative analysis of transcriptional profiling. It especially fits the comparison of time series data of related organisms or different tissues of the same organism under equivalent or similar experimental conditions. The general scheme can be directly extended to the comparisons of multiple data sets. It also can be applied to the integration of data sets from different platforms and of different sources.
Collapse
|
131
|
Rosa BA, Oh S, Montgomery BL, Chen J, Qin W. Computing gene expression data with a knowledge-based gene clustering approach. INTERNATIONAL JOURNAL OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2010; 1:51-68. [PMID: 21968910 PMCID: PMC3180043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 04/25/2010] [Accepted: 06/11/2010] [Indexed: 05/31/2023]
Abstract
Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.
Collapse
|