1
|
Badsha MB, Fu AQ. Learning Causal Biological Networks With the Principle of Mendelian Randomization. Front Genet 2019; 10:460. [PMID: 31164902 PMCID: PMC6536645 DOI: 10.3389/fgene.2019.00460] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 04/30/2019] [Indexed: 01/09/2023] Open
Abstract
Although large amounts of genomic data are available, it remains a challenge to reliably infer causal (i. e., regulatory) relationships among molecular phenotypes (such as gene expression), especially when multiple phenotypes are involved. We extend the interpretation of the Principle of Mendelian randomization (PMR) and present MRPC, a novel machine learning algorithm that incorporates the PMR in the PC algorithm, a classical algorithm for learning causal graphs in computer science. MRPC learns a causal biological network efficiently and robustly from integrating individual-level genotype and molecular phenotype data, in which directed edges indicate causal directions. We demonstrate through simulation that MRPC outperforms several popular general-purpose network inference methods and PMR-based methods. We apply MRPC to distinguish direct and indirect targets among multiple genes associated with expression quantitative trait loci. Our method is implemented in the R package MRPC, available on CRAN (https://cran.r-project.org/web/packages/MRPC/index.html).
Collapse
Affiliation(s)
- Md. Bahadur Badsha
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| | - Audrey Qiuyan Fu
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| |
Collapse
|
2
|
Kumar J, Gupta DS, Gupta S, Dubey S, Gupta P, Kumar S. Quantitative trait loci from identification to exploitation for crop improvement. PLANT CELL REPORTS 2017; 36:1187-1213. [PMID: 28352970 DOI: 10.1007/s00299-017-2127-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 03/09/2017] [Indexed: 05/24/2023]
Abstract
Advancement in the field of genetics and genomics after the discovery of Mendel's laws of inheritance has led to map the genes controlling qualitative and quantitative traits in crop plant species. Mapping of genomic regions controlling the variation of quantitatively inherited traits has become routine after the advent of different types of molecular markers. Recently, the next generation sequencing methods have accelerated the research on QTL analysis. These efforts have led to the identification of more closely linked molecular markers with gene/QTLs and also identified markers even within gene/QTL controlling the trait of interest. Efforts have also been made towards cloning gene/QTLs or identification of potential candidate genes responsible for a trait. Further new concepts like crop QTLome and QTL prioritization have accelerated precise application of QTLs for genetic improvement of complex traits. In the past years, efforts have also been made in exploitation of a number of QTL for improving grain yield or other agronomic traits in various crops through markers assisted selection leading to cultivation of these improved varieties at farmers' field. In present article, we reviewed QTLs from their identification to exploitation in plant breeding programs and also reviewed that how improved cultivars developed through introgression of QTLs have improved the yield productivity in many crops.
Collapse
Affiliation(s)
- Jitendra Kumar
- Division of Crop Improvement, ICAR-Indian Institute of Pulses Research, Kanpur, India.
| | - Debjyoti Sen Gupta
- Division of Crop Improvement, ICAR-Indian Institute of Pulses Research, Kanpur, India
| | - Sunanda Gupta
- Division of Crop Improvement, ICAR-Indian Institute of Pulses Research, Kanpur, India
| | - Sonali Dubey
- Division of Crop Improvement, ICAR-Indian Institute of Pulses Research, Kanpur, India
| | - Priyanka Gupta
- Division of Crop Improvement, ICAR-Indian Institute of Pulses Research, Kanpur, India
| | - Shiv Kumar
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat-Institutes, B.P. 6299, Rabat, Morocco
| |
Collapse
|
3
|
Guo W, Calixto CPG, Tzioutziou N, Lin P, Waugh R, Brown JWS, Zhang R. Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size. BMC SYSTEMS BIOLOGY 2017; 11:62. [PMID: 28629365 PMCID: PMC5477119 DOI: 10.1186/s12918-017-0440-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 06/09/2017] [Indexed: 12/18/2022]
Abstract
BACKGROUND Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. METHODS We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. RESULTS We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. CONCLUSIONS The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .
Collapse
Affiliation(s)
- Wenbin Guo
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Cristiane P G Calixto
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Nikoleta Tzioutziou
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Ping Lin
- Division of Mathematics, University of Dundee, Nethergate, Dundee, Scotland, DD1 4HN, UK
| | - Robbie Waugh
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK.
| |
Collapse
|
4
|
Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W. Learning from Co-expression Networks: Possibilities and Challenges. FRONTIERS IN PLANT SCIENCE 2016; 7:444. [PMID: 27092161 PMCID: PMC4825623 DOI: 10.3389/fpls.2016.00444] [Citation(s) in RCA: 186] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 05/18/2023]
Abstract
Plants are fascinating and complex organisms. A comprehensive understanding of the organization, function and evolution of plant genes is essential to disentangle important biological processes and to advance crop engineering and breeding strategies. The ultimate aim in deciphering complex biological processes is the discovery of causal genes and regulatory mechanisms controlling these processes. The recent surge of omics data has opened the door to a system-wide understanding of the flow of biological information underlying complex traits. However, dealing with the corresponding large data sets represents a challenging endeavor that calls for the development of powerful bioinformatics methods. A popular approach is the construction and analysis of gene networks. Such networks are often used for genome-wide representation of the complex functional organization of biological systems. Network based on similarity in gene expression are called (gene) co-expression networks. One of the major application of gene co-expression networks is the functional annotation of unknown genes. Constructing co-expression networks is generally straightforward. In contrast, the resulting network of connected genes can become very complex, which limits its biological interpretation. Several strategies can be employed to enhance the interpretation of the networks. A strategy in coherence with the biological question addressed needs to be established to infer reliable networks. Additional benefits can be gained from network-based strategies using prior knowledge and data integration to further enhance the elucidation of gene regulatory relationships. As a result, biological networks provide many more applications beyond the simple visualization of co-expressed genes. In this study we review the different approaches for co-expression network inference in plants. We analyse integrative genomics strategies used in recent studies that successfully identified candidate genes taking advantage of gene co-expression networks. Additionally, we discuss promising bioinformatics approaches that predict networks for specific purposes.
Collapse
Affiliation(s)
- Elise A. R. Serin
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Harm Nijveen
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- Laboratory of Bioinformatics, Wageningen UniversityWageningen, Netherlands
| | - Henk W. M. Hilhorst
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Wilco Ligterink
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- *Correspondence: Wilco Ligterink
| |
Collapse
|
5
|
Mohamed Salleh FH, Arif SM, Zainudin S, Firdaus-Raih M. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient. Comput Biol Chem 2015; 59 Pt B:3-14. [DOI: 10.1016/j.compbiolchem.2015.04.012] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Revised: 04/16/2015] [Accepted: 04/27/2015] [Indexed: 11/26/2022]
|
6
|
Zhu F, Shi L, Engel JD, Guan Y. Regulatory network inferred using expression data of small sample size: application and validation in erythroid system. Bioinformatics 2015; 31:2537-44. [PMID: 25840044 DOI: 10.1093/bioinformatics/btv186] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. The outcome of the current algorithms highly depends on the quality and quantity of a single time-course dataset, and the performance may be compromised for datasets with a limited number of samples. RESULTS In this work, we report a multi-layer graphical model that is capable of leveraging many publicly available time-course datasets, as well as a cell lineage-specific data with small sample size, to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual public datasets. Then, the inferred directional relationships are weighted and integrated together by evaluating against the cell lineage-specific dataset. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lihong Shi
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | | | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, Department of Internal Medicine, and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
7
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
8
|
Wang J, Yu H, Weng X, Xie W, Xu C, Li X, Xiao J, Zhang Q. An expression quantitative trait loci-guided co-expression analysis for constructing regulatory network using a rice recombinant inbred line population. JOURNAL OF EXPERIMENTAL BOTANY 2014; 65:1069-79. [PMID: 24420573 PMCID: PMC3935569 DOI: 10.1093/jxb/ert464] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The ability to reveal the regulatory architecture of genes at the whole-genome level by constructing a regulatory network is critical for understanding the biological processes and developmental programmes of organisms. Here, we conducted an eQTL-guided function-related co-expression analysis to identify the putative regulators and construct gene regulatory network. We performed an eQTL analysis of 210 recombinant inbred lines (RILs) derived from a cross between two indica rice lines, Zhenshan 97 and Minghui 63, the parents of an elite hybrid, using data obtained by hybridizing RNA samples of flag leaves at the heading stage with Affymetrix whole-genome arrays. Making use of an ultrahigh-density single-nucleotide polymorphism bin map constructed by population sequencing, 13 647 eQTLs for 10 725 e-traits were detected, comprising 5079 cis-eQTLs (37.2%) and 8568 trans-eQTLs (62.8%). The analysis revealed 138 trans-eQTLs hotspots, each of which apparently regulates the expression variations of many genes. Co-expression analysis of functionally related genes within the framework of regulator-target relationships outlined by the eQTLs led to the identification of putative regulators in the system. The usefulness of the strategy was demonstrated with the genes known to be involved in flowering. We also applied this strategy to the analysis of QTLs for yield traits, which also suggested likely candidate genes. eQTL-guided co-expression analysis may provide a promising solution for outlining a framework for the complex regulatory network of an organism.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Qifa Zhang
- * To whom correspondence should be addressed.
| |
Collapse
|
9
|
Pinna A, Heise S, Flassig RJ, de la Fuente A, Klamt S. Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation. BMC SYSTEMS BIOLOGY 2013; 7:73. [PMID: 23924435 PMCID: PMC4231426 DOI: 10.1186/1752-0509-7-73] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 08/05/2013] [Indexed: 02/08/2023]
Abstract
Background The data-driven inference of intracellular networks is one of the key challenges of computational and systems biology. As suggested by recent works, a simple yet effective approach for reconstructing regulatory networks comprises the following two steps. First, the observed effects induced by directed perturbations are collected in a signed and directed perturbation graph (PG). In a second step, Transitive Reduction (TR) is used to identify and eliminate those edges in the PG that can be explained by paths and are therefore likely to reflect indirect effects. Results In this work we introduce novel variants for PG generation and TR, leading to significantly improved performances. The key modifications concern: (i) use of novel statistical criteria for deriving a high-quality PG from experimental data; (ii) the application of local TR which allows only short paths to explain (and remove) a given edge; and (iii) a novel strategy to rank the edges with respect to their confidence. To compare the new methods with existing ones we not only apply them to a recent DREAM network inference challenge but also to a novel and unprecedented synthetic compendium consisting of 30 5000-gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. The benchmarks clearly demonstrate the superior reconstruction performance of the novel PG and TR variants compared to existing approaches. Moreover, the benchmark enabled us to draw some general conclusions. For example, it turns out that local TR restricted to paths with a length of only two is often sufficient or even favorable. We also demonstrate that considering edge weights is highly beneficial for TR whereas consideration of edge signs is of minor importance. We explain these observations from a graph-theoretical perspective and discuss the consequences with respect to a greatly reduced computational demand to conduct TR. Finally, as a realistic application scenario, we use our framework for inferring gene interactions in yeast based on a library of gene expression data measured in mutants with single knockouts of transcription factors. The reconstructed network shows a significant enrichment of known interactions, especially within the 100 most confident (and for experimental validation most relevant) edges. Conclusions This paper presents several major achievements. The novel methods introduced herein can be seen as state of the art for inference techniques relying on perturbation graphs and transitive reduction. Another key result of the study is the generation of a new and unprecedented large-scale in silico benchmark dataset accounting for different noise levels and providing a solid basis for unbiased testing of network inference methodologies. Finally, applying our approach to Saccharomyces cerevisiae suggested several new gene interactions with high confidence awaiting experimental validation.
Collapse
Affiliation(s)
- Andrea Pinna
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.
| | | | | | | | | |
Collapse
|