1
|
Zhang W, Dai X, Xu S, Zhao PX. 2D association and integrative omics analysis in rice provides systems biology view in trait analysis. Commun Biol 2018; 1:153. [PMID: 30272029 PMCID: PMC6160469 DOI: 10.1038/s42003-018-0159-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 08/30/2018] [Indexed: 12/28/2022] Open
Abstract
The interactions among genes and between genes and environment contribute significantly to the phenotypic variation of complex traits and may be possible explanations for missing heritability. However, to our knowledge no existing tool can address the two kinds of interactions. Here we propose a novel linear mixed model that considers not only the additive effects of biological markers but also the interaction effects of marker pairs. Interaction effect is demonstrated as a 2D association. Based on this linear mixed model, we developed a pipeline, namely PATOWAS. PATOWAS can be used to study transcriptome-wide and metabolome-wide associations in addition to genome-wide associations. Our case analysis with real rice recombinant inbred lines (RILs) at three omics levels demonstrates that 2D association mapping and integrative omics are able to provide a systems biology view into the analyzed traits, leading toward an answer about how genes, transcripts, proteins, and metabolites work together to produce an observable phenotype. Wenchao Zhang et al. developed a tool for analyzing traits using data generated from genome-wide, transcriptome-wide, and metabolome-wide association studies. They test their approach in rice, providing a systems biology view of identified traits.
Collapse
Affiliation(s)
- Wenchao Zhang
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA
| | - Xinbin Dai
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA.
| | - Patrick X Zhao
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA.
| |
Collapse
|
2
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
3
|
Li J, Zhao PX. Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach. FRONTIERS IN PLANT SCIENCE 2016; 7:903. [PMID: 27446133 PMCID: PMC4916224 DOI: 10.3389/fpls.2016.00903] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 06/08/2016] [Indexed: 06/06/2023]
Abstract
Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/.
Collapse
|
4
|
Zhang W, Dai X, Wang Q, Xu S, Zhao PX. PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies. PLoS Comput Biol 2016; 12:e1004925. [PMID: 27224861 PMCID: PMC4880203 DOI: 10.1371/journal.pcbi.1004925] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 04/18/2016] [Indexed: 11/19/2022] Open
Abstract
The term epistasis refers to interactions between multiple genetic loci. Genetic epistasis is important in regulating biological function and is considered to explain part of the 'missing heritability,' which involves marginal genetic effects that cannot be accounted for in genome-wide association studies. Thus, the study of epistasis is of great interest to geneticists. However, estimating epistatic effects for quantitative traits is challenging due to the large number of interaction effects that must be estimated, thus significantly increasing computing demands. Here, we present a new web server-based tool, the Pipeline for estimating EPIStatic genetic effects (PEPIS), for analyzing polygenic epistatic effects. The PEPIS software package is based on a new linear mixed model that has been used to predict the performance of hybrid rice. The PEPIS includes two main sub-pipelines: the first for kinship matrix calculation, and the second for polygenic component analyses and genome scanning for main and epistatic effects. To accommodate the demand for high-performance computation, the PEPIS utilizes C/C++ for mathematical matrix computing. In addition, the modules for kinship matrix calculations and main and epistatic-effect genome scanning employ parallel computing technology that effectively utilizes multiple computer nodes across our networked cluster, thus significantly improving the computational speed. For example, when analyzing the same immortalized F2 rice population genotypic data examined in a previous study, the PEPIS returned identical results at each analysis step with the original prototype R code, but the computational time was reduced from more than one month to about five minutes. These advances will help overcome the bottleneck frequently encountered in genome wide epistatic genetic effect analysis and enable accommodation of the high computational demand. The PEPIS is publically available at http://bioinfo.noble.org/PolyGenic_QTL/.
Collapse
Affiliation(s)
- Wenchao Zhang
- Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
| | - Xinbin Dai
- Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
| | - Qishan Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, Riverside, California, United States of America
| | - Patrick X. Zhao
- Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
| |
Collapse
|
5
|
Dai X, Li J, Liu T, Zhao PX. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks. PLANT & CELL PHYSIOLOGY 2016; 57:e12. [PMID: 26657893 PMCID: PMC4722177 DOI: 10.1093/pcp/pcv200] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 12/07/2015] [Indexed: 05/10/2023]
Abstract
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.
Collapse
Affiliation(s)
- Xinbin Dai
- Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Jun Li
- Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Tingsong Liu
- Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Patrick Xuechun Zhao
- Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| |
Collapse
|
6
|
Li J, Dai X, Zhuang Z, Zhao PX. LegumeIP 2.0--a platform for the study of gene function and genome evolution in legumes. Nucleic Acids Res 2015; 44:D1189-94. [PMID: 26578557 PMCID: PMC4702875 DOI: 10.1093/nar/gkv1237] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 11/02/2015] [Indexed: 01/04/2023] Open
Abstract
The LegumeIP 2.0 database hosts large-scale genomics and transcriptomics data and provides integrative bioinformatics tools for the study of gene function and evolution in legumes. Our recent updates in LegumeIP 2.0 include gene and protein sequences, gene models and annotations, syntenic regions, protein families and phylogenetic trees for six legume species: Medicago truncatula, Glycine max (soybean), Lotus japonicus, Phaseolus vulgaris (common bean), Cicer arietinum (chickpea) and Cajanus cajan (pigeon pea) and two outgroup reference species: Arabidopsis thaliana and Poplar trichocarpa. Moreover, the LegumeIP 2.0 features the following new data resources and bioinformatics tools: (i) an integrative gene expression atlas for four model legumes that include 550 array hybridizations from M. truncatula, 962 gene expression profiles of G. max, 276 array hybridizations from L. japonicas and 56 RNA-Seq-based gene expression profiles for C. arietinum. These datasets were manually curated and hierarchically organized based on Experimental Ontology and Plant Ontology so that users can browse, search, and retrieve data for their selected experiments. (ii) New functions/analytical tools to query, mine and visualize large-scale gene sequences, annotations and transcriptome profiles. Users may select a subset of expression experiments and visualize and compare expression profiles for multiple genes. The LegumeIP 2.0 database is freely available to the public at http://plantgrn.noble.org/LegumeIP/.
Collapse
Affiliation(s)
- Jun Li
- Bioinformatics Lab, Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Xinbin Dai
- Bioinformatics Lab, Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Zhaohong Zhuang
- Bioinformatics Lab, Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Patrick X Zhao
- Bioinformatics Lab, Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| |
Collapse
|
7
|
Gehan MA, Greenham K, Mockler TC, McClung CR. Transcriptional networks-crops, clocks, and abiotic stress. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:39-46. [PMID: 25646668 DOI: 10.1016/j.pbi.2015.01.004] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 01/07/2015] [Accepted: 01/08/2015] [Indexed: 05/20/2023]
Abstract
Several factors affect the yield potential and geographical range of crops including the circadian clock, water availability, and seasonal temperature changes. In order to sustain and increase plant productivity on marginal land in the face of both biotic and abiotic stresses, we need to more efficiently generate stress-resistant crops through marker-assisted breeding, genetic modification, and new genome-editing technologies. To leverage these strategies for producing the next generation of crops, future transcriptomic data acquisition should be pursued with an appropriate temporal design and analyzed with a network-centric approach. The following review focuses on recent developments in abiotic stress transcriptional networks in economically important crops and will highlight the utility of correlation-based network analysis and applications.
Collapse
Affiliation(s)
- Malia A Gehan
- Donald Danforth Plant Science Center, St. Louis, MO 63132, United States
| | - Kathleen Greenham
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States
| | - Todd C Mockler
- Donald Danforth Plant Science Center, St. Louis, MO 63132, United States
| | - C Robertson McClung
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States.
| |
Collapse
|
8
|
DeGNServer: deciphering genome-scale gene networks through high performance reverse engineering analysis. BIOMED RESEARCH INTERNATIONAL 2013; 2013:856325. [PMID: 24328032 PMCID: PMC3847961 DOI: 10.1155/2013/856325] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 10/01/2013] [Indexed: 12/23/2022]
Abstract
Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
Collapse
|