1
|
Tu M, Zeng J, Zhang J, Fan G, Song G. Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics. FRONTIERS IN PLANT SCIENCE 2022; 13:1038109. [PMID: 36570898 PMCID: PMC9773216 DOI: 10.3389/fpls.2022.1038109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
RNA-seq has become a state-of-the-art technique for transcriptomic studies. Advances in both RNA-seq techniques and the corresponding analysis tools and pipelines have unprecedently shaped our understanding in almost every aspects of plant sciences. Notably, the integration of huge amount of RNA-seq with other omic data sets in the model plants and major crop species have facilitated plant regulomics, while the RNA-seq analysis has still been primarily used for differential expression analysis in many less-studied plant species. To unleash the analytical power of RNA-seq in plant species, especially less-studied species and biomass crops, we summarize recent achievements of RNA-seq analysis in the major plant species and representative tools in the four types of application: (1) transcriptome assembly, (2) construction of expression atlas, (3) network analysis, and (4) structural alteration. We emphasize the importance of expression atlas, coexpression networks and predictions of gene regulatory relationships in moving plant transcriptomes toward regulomics, an omic view of genome-wide transcription regulation. We highlight what can be achieved in plant research with RNA-seq by introducing a list of representative RNA-seq analysis tools and resources that are developed for certain minor species or suitable for the analysis without species limitation. In summary, we provide an updated digest on RNA-seq tools, resources and the diverse applications for plant research, and our perspective on the power and challenges of short-read RNA-seq analysis from a regulomic point view. A full utilization of these fruitful RNA-seq resources will promote plant omic research to a higher level, especially in those less studied species.
Collapse
Affiliation(s)
- Min Tu
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Jian Zeng
- Guangdong Provincial Key Laboratory of Utilization and Conservation of Food and Medicinal Resources in Northern Region, Shaoguan University, Shaoguan, Guangdong, China
| | - Juntao Zhang
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guozhi Fan
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guangsen Song
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| |
Collapse
|
2
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:6604993. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
3
|
Lenz AR, Galán-Vásquez E, Balbinot E, de Abreu FP, Souza de Oliveira N, da Rosa LO, de Avila e Silva S, Camassola M, Dillon AJP, Perez-Rueda E. Gene Regulatory Networks of Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 Inferred by a Computational Biology Approach. Front Microbiol 2020; 11:588263. [PMID: 33193246 PMCID: PMC7652724 DOI: 10.3389/fmicb.2020.588263] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/23/2020] [Indexed: 11/29/2022] Open
Abstract
Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 are well-known cellulase fungal producers. However, few studies addressing global mechanisms for gene regulation of these two important organisms are available so far. A recent finding that the 2HH wild-type is closely related to P. oxalicum leads to a combined study of these two species. Firstly, we provide a global gene regulatory network for P. echinulatum 2HH and P. oxalicum 114-2, based on TF-TG orthology relationships, considering three related species with well-known regulatory interactions combined with TFBSs prediction. The network was then analyzed in terms of topology, identifying TFs as hubs, and modules. Based on this approach, we explore numerous identified modules, such as the expression of cellulolytic and xylanolytic systems, where XlnR plays a key role in positive regulation of the xylanolytic system. It also regulates positively the cellulolytic system by acting indirectly through the cellodextrin induction system. This remarkable finding suggests that the XlnR-dependent cellulolytic and xylanolytic regulatory systems are probably conserved in both P. echinulatum and P. oxalicum. Finally, we explore the functional congruency on the genes clustered in terms of communities, where the genes related to cellular nitrogen, compound metabolic process and macromolecule metabolic process were the most abundant. Therefore, our approach allows us to confer a degree of accuracy regarding the existence of each inferred interaction.
Collapse
Affiliation(s)
- Alexandre Rafael Lenz
- Unidad Académica Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Mérida, Mexico
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
- Departamento de Ciências Exatas e da Terra, Universidade do Estado da Bahia, Salvador, Brazil
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemàticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Ciudad Universitaria, Mexico
| | - Eduardo Balbinot
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Nikael Souza de Oliveira
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Letícia Osório da Rosa
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila e Silva
- Laboratório de Bioinformática e Biologia Computacional, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Marli Camassola
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Aldo José Pinheiro Dillon
- Laboratório de Enzimas e Biomassas, Instituto de Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, Brazil
| | - Ernesto Perez-Rueda
- Unidad Académica Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de Mexico, Mérida, Mexico
- Facultad de Ciencias, Centro de Genómica y Bioinformática, Universidad Mayor, Santiago, Chile
| |
Collapse
|
4
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 210] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
5
|
Abstract
Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS .
Collapse
Affiliation(s)
| | - Iulia Gherman
- Warwick Integrative Synthetic Biology Centre, School of Engineering, University of Warwick, Coventry, UK
| | - Anastasiya Sybirna
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK
- Wellcome/MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Physiology, Development and Neuroscience Department, University of Cambridge, Cambridge, UK
| | - David L Wild
- Department of Statistics and Systems Biology Centre, University of Warwick, Coventry, UK
| |
Collapse
|
6
|
Penfold CA, Sybirna A, Reid JE, Huang Y, Wernisch L, Ghahramani Z, Grant M, Surani MA. Branch-recombinant Gaussian processes for analysis of perturbations in biological time series. Bioinformatics 2018; 34:i1005-i1013. [PMID: 30423108 PMCID: PMC6129282 DOI: 10.1093/bioinformatics/bty603] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Motivation A common class of behaviour encountered in the biological sciences involves branching and recombination. During branching, a statistical process bifurcates resulting in two or more potentially correlated processes that may undergo further branching; the contrary is true during recombination, where two or more statistical processes converge. A key objective is to identify the time of this bifurcation (branch or recombination time) from time series measurements, e.g. by comparing a control time series with perturbed time series. Gaussian processes (GPs) represent an ideal framework for such analysis, allowing for nonlinear regression that includes a rigorous treatment of uncertainty. Currently, however, GP models only exist for two-branch systems. Here, we highlight how arbitrarily complex branching processes can be built using the correct composition of covariance functions within a GP framework, thus outlining a general framework for the treatment of branching and recombination in the form of branch-recombinant Gaussian processes (B-RGPs). Results We first benchmark the performance of B-RGPs compared to a variety of existing regression approaches, and demonstrate robustness to model misspecification. B-RGPs are then used to investigate the branching patterns of Arabidopsis thaliana gene expression following inoculation with the hemibotrophic bacteria, Pseudomonas syringae DC3000, and a disarmed mutant strain, hrpA. By grouping genes according to the number of branches, we could naturally separate out genes involved in basal immune response from those subverted by the virulent strain, and show enrichment for targets of pathogen protein effectors. Finally, we identify two early branching genes WRKY11 and WRKY17, and show that genes that branched at similar times to WRKY11/17 were enriched for W-box binding motifs, and overrepresented for genes differentially expressed in WRKY11/17 knockouts, suggesting that branch time could be used for identifying direct and indirect binding targets of key transcription factors. Availability and implementation https://github.com/cap76/BranchingGPs Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher A Penfold
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Statistics, University of Warwick, Coventry, UK
| | - Anastasiya Sybirna
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Wellcome/MRC Stem Cell Institute, University of Cambridge, UK.,Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge
| | - John E Reid
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK.,The Alan Turing Institute, London, UK
| | - Yun Huang
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge
| | - Lorenz Wernisch
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | | | - Murray Grant
- School of Life Sciences, Gibbet Hill Campus, The University of Warwick, Coventry, UK
| | - M Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Statistics, University of Warwick, Coventry, UK.,Wellcome/MRC Stem Cell Institute, University of Cambridge, UK
| |
Collapse
|
7
|
Polanski K, Gao B, Mason SA, Brown P, Ott S, Denby KJ, Wild DL. Bringing numerous methods for expression and promoter analysis to a public cloud computing service. Bioinformatics 2018; 34:884-886. [PMID: 29126246 PMCID: PMC6030968 DOI: 10.1093/bioinformatics/btx692] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 11/03/2017] [Indexed: 12/24/2022] Open
Abstract
Summary Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Paul Brown
- Department of Mathematics
- Systems Biology Centre
| | - Sascha Ott
- Systems Biology Centre
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| | | | | |
Collapse
|
8
|
Shahdoust M, Pezeshk H, Mahjub H, Sadeghi M. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints. PLoS One 2017; 12:e0184795. [PMID: 28938012 PMCID: PMC5609748 DOI: 10.1371/journal.pone.0184795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 08/31/2017] [Indexed: 01/07/2023] Open
Abstract
The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches.
Collapse
Affiliation(s)
- Maryam Shahdoust
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Hossein Mahjub
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
9
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
10
|
Fused Regression for Multi-source Gene Regulatory Network Inference. PLoS Comput Biol 2016; 12:e1005157. [PMID: 27923054 PMCID: PMC5140053 DOI: 10.1371/journal.pcbi.1005157] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/20/2016] [Indexed: 12/03/2022] Open
Abstract
Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms. Gene regulatory networks describing related biological processes are thought to share conserved interaction structure. This assumption motivates a great deal of work in model systems–where discovery of gene regulation may be more experimentally tractable–but is difficult to directly evaluate using existing methods. The presence of shared structure in a well studied model system or process should make the problem of network inference in a related process easier, but this information is not often applied to the discovery of global gene regulatory networks. Further, to be able to successfully translate findings between different organisms, it is important to be able to identify where regulatory structure is different. We provide a method based on penalized fused regression for inferring gene regulatory networks given prior knowledge about the similarity of interactions in each network. This method is demonstrated on synthetic data, and applied to the problem of inferring networks in distantly related bacterial organisms. We then introduce an extension of the method to deal with the condition of uncertainty over the degree of regulatory conservation by simultaneously inferring gene conservation and interaction weights.
Collapse
|