1
|
Kiruba B, Naidu A, Sundararajan V, Lulu S S. Mapping integral cell-type-specific interferon-induced gene regulatory networks (GRNs) involved in systemic lupus erythematosus using systems and computational analysis. Heliyon 2025; 11:e41342. [PMID: 39844998 PMCID: PMC11751531 DOI: 10.1016/j.heliyon.2024.e41342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 12/17/2024] [Accepted: 12/17/2024] [Indexed: 01/24/2025] Open
Abstract
Systemic lupus erythematosus (SLE) is a systemic autoimmune disorder characterized by the production of autoantibodies, resulting in inflammation and organ damage. Although extensive research has been conducted on SLE pathogenesis, a comprehensive understanding of its molecular landscape in different cell types has not been achieved. This study uncovers the molecular mechanisms of the disease by thoroughly examining gene regulatory networks within neutrophils, dendritic cells, T cells, and B cells. Firstly, we identified genes and ncRNAs with differential expression in SLE patients compared to controls for different cell types. Furthermore, the derived differentially expressed genes were curated based on immune functions using functional enrichment analysis to create a protein-protein interaction network. Topological network analysis of the formed genes revealed key hub genes associated with each of the cell types. To understand the regulatory mechanism through which these hub genes function in the diseased state, their associations with transcription factors, and non-coding RNAs in different immune cell types were investigated through correlation analysis and regression models. Finally, by integrating these findings, distinct gene regulatory networks were constructed, which provide a novel perspective on the molecular, cellular, and immunological landscapes of SLE. Importantly, we reveal the crucial role of IRF3, IRF7, and STAT1 in neutrophils, dendritic cells, and T cells, where their aberrant upregulation in disease states might enhance the production of type I IFN. Furthermore, we found MYB to be a crucial regulator that might activate T cells toward autoimmune responses in SLE. Similarly, in B-cell lymphocytes, we found FOXO1 to be a key player in autophagy and chemokine regulation. These findings were also validated using single-cell RNASeq analysis using an independent dataset. Genotype variations of these genes were also explored using the GWAS central database to ensure their targetability. These findings indicate that IRF3, IRF7, STAT1, MYB, and FOXO1 are promising targets for therapeutic interventions for SLE.
Collapse
Affiliation(s)
- Blessy Kiruba
- Department of Biosciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632 014, Tamil Nadu, India
| | - Akshayata Naidu
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632 014, Tamil Nadu, India
| | - Vino Sundararajan
- Department of Biosciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632 014, Tamil Nadu, India
| | - Sajitha Lulu S
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632 014, Tamil Nadu, India
| |
Collapse
|
2
|
Wu Z, Sinha S. SPREd: A simulation-supervised neural network tool for gene regulatory network reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566399. [PMID: 38014297 PMCID: PMC10680606 DOI: 10.1101/2023.11.09.566399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA
| |
Collapse
|
3
|
Kadelka C, Wheeler M, Veliz-Cuba A, Murrugarra D, Laubenbacher R. Modularity of biological systems: a link between structure and function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557227. [PMID: 37745485 PMCID: PMC10515856 DOI: 10.1101/2023.09.11.557227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
This paper addresses two topics in systems biology, the hypothesis that biological systems are modular and the problem of relating structure and function of biological systems. The focus here is on gene regulatory networks, represented by Boolean network models, a commonly used tool. Most of the research on gene regulatory network modularity has focused on network structure, typically represented through either directed or undirected graphs. But since gene regulation is a highly dynamic process as it determines the function of cells over time, it is natural to consider functional modularity as well. One of the main results is that the structural decomposition of a network into modules induces an analogous decomposition of the dynamic structure, exhibiting a strong relationship between network structure and function. An extensive simulation study provides evidence for the hypothesis that modularity might have evolved to increase phenotypic complexity while maintaining maximal dynamic robustness to external perturbations.
Collapse
Affiliation(s)
- Claus Kadelka
- Department of Mathematics, Iowa State University, Ames, IA 50011, United States
| | - Matthew Wheeler
- Department of Medicine, University of Florida, Gainesville, FL, United States
| | - Alan Veliz-Cuba
- Department of Mathematics, University of Dayton, Dayton, OH, United States
| | - David Murrugarra
- Department of Mathematics, University of Kentucky, Lexington, KY, United States
| | | |
Collapse
|
4
|
Khozyainova AA, Valyaeva AA, Arbatsky MS, Isaev SV, Iamshchikov PS, Volchkov EV, Sabirov MS, Zainullina VR, Chechekhin VI, Vorobev RS, Menyailo ME, Tyurin-Kuzmin PA, Denisov EV. Complex Analysis of Single-Cell RNA Sequencing Data. BIOCHEMISTRY. BIOKHIMIIA 2023; 88:231-252. [PMID: 37072324 PMCID: PMC10000364 DOI: 10.1134/s0006297923020074] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 03/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool for studying the physiology of normal and pathologically altered tissues. This approach provides information about molecular features (gene expression, mutations, chromatin accessibility, etc.) of cells, opens up the possibility to analyze the trajectories/phylogeny of cell differentiation and cell-cell interactions, and helps in discovery of new cell types and previously unexplored processes. From a clinical point of view, scRNA-seq facilitates deeper and more detailed analysis of molecular mechanisms of diseases and serves as a basis for the development of new preventive, diagnostic, and therapeutic strategies. The review describes different approaches to the analysis of scRNA-seq data, discusses the advantages and disadvantages of bioinformatics tools, provides recommendations and examples of their successful use, and suggests potential directions for improvement. We also emphasize the need for creating new protocols, including multiomics ones, for the preparation of DNA/RNA libraries of single cells with the purpose of more complete understanding of individual cells.
Collapse
Affiliation(s)
- Anna A Khozyainova
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia.
| | - Anna A Valyaeva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mikhail S Arbatsky
- Laboratory of Artificial Intelligence and Bioinformatics, The Russian Clinical Research Center for Gerontology, Pirogov Russian National Medical University, Moscow, 129226, Russia
- School of Public Administration, Lomonosov Moscow State University, Moscow, 119991, Russia
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Sergey V Isaev
- Research Institute of Personalized Medicine, National Center for Personalized Medicine of Endocrine Diseases, National Medical Research Center for Endocrinology, Moscow, 117036, Russia
| | - Pavel S Iamshchikov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
- Laboratory of Complex Analysis of Big Bioimage Data, National Research Tomsk State University, Tomsk, 634050, Russia
| | - Egor V Volchkov
- Department of Oncohematology, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, 117198, Russia
| | - Marat S Sabirov
- Laboratory of Bioinformatics and Molecular Genetics, Koltzov Institute of Developmental Biology of the Russian Academy of Sciences, Moscow, 119334, Russia
| | - Viktoria R Zainullina
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Vadim I Chechekhin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Rostislav S Vorobev
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Maxim E Menyailo
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Pyotr A Tyurin-Kuzmin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Evgeny V Denisov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| |
Collapse
|
5
|
Sehgal D, Dhakate P, Ambreen H, Shaik KHB, Rathan ND, Anusha NM, Deshmukh R, Vikram P. Wheat Omics: Advancements and Opportunities. PLANTS (BASEL, SWITZERLAND) 2023; 12:426. [PMID: 36771512 PMCID: PMC9919419 DOI: 10.3390/plants12030426] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 12/07/2022] [Accepted: 12/14/2022] [Indexed: 06/18/2023]
Abstract
Plant omics, which includes genomics, transcriptomics, metabolomics and proteomics, has played a remarkable role in the discovery of new genes and biomolecules that can be deployed for crop improvement. In wheat, great insights have been gleaned from the utilization of diverse omics approaches for both qualitative and quantitative traits. Especially, a combination of omics approaches has led to significant advances in gene discovery and pathway investigations and in deciphering the essential components of stress responses and yields. Recently, a Wheat Omics database has been developed for wheat which could be used by scientists for further accelerating functional genomics studies. In this review, we have discussed various omics technologies and platforms that have been used in wheat to enhance the understanding of the stress biology of the crop and the molecular mechanisms underlying stress tolerance.
Collapse
Affiliation(s)
- Deepmala Sehgal
- International Maize and Wheat Improvement Center (CIMMYT), El Batán, Texcoco 56237, Mexico
- Syngenta, Jealott’s Hill International Research Centre, Bracknell, Berkshire RG42 6EY, UK
| | - Priyanka Dhakate
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi 110076, India
| | - Heena Ambreen
- School of Life Sciences, University of Sussex, Brighton BN1 9RH, UK
| | - Khasim Hussain Baji Shaik
- Faculty of Agriculture Sciences, Georg-August-Universität, Wilhelmsplatz 1, 37073 Göttingen, Germany
| | - Nagenahalli Dharmegowda Rathan
- Indian Agricultural Research Institute (ICAR-IARI), New Delhi 110012, India
- Corteva Agriscience, Hyderabad 502336, Telangana, India
| | | | - Rupesh Deshmukh
- Department of Biotechnology, Central University of Haryana, Mahendragarh 123031, Haryana, India
| | - Prashant Vikram
- Bioseed Research India Ltd., Hyderabad 5023324, Telangana, India
| |
Collapse
|
6
|
Shi T, Yu H, Blair RH. Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization. Stat Appl Genet Mol Biol 2023; 22:sagmb-2022-0054. [PMID: 37988745 DOI: 10.1515/sagmb-2022-0054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 09/28/2023] [Indexed: 11/23/2023]
Abstract
Translation of genomic discovery, such as single-cell sequencing data, to clinical decisions remains a longstanding bottleneck in the field. Meanwhile, computational systems biological models, such as cellular metabolism models and cell signaling pathways, have emerged as powerful approaches to provide efficient predictions in metabolites and gene expression levels, respectively. However, there has been limited research on the integration between these two models. This work develops a methodology for integrating computational models of probabilistic gene regulatory networks with a constraint-based metabolism model. By using probabilistic reasoning with Bayesian Networks, we aim to predict cell-specific changes under different interventions, which are embedded into the constraint-based models of metabolism. Applications to single-cell sequencing data of glioblastoma brain tumors generate predictions about the effects of pharmaceutical interventions on the regulatory network and downstream metabolisms in different cell types from the tumor microenvironment. The model presents possible insights into treatments that could potentially suppress anaerobic metabolism in malignant cells with minimal impact on other cell types' metabolism. The proposed integrated model can guide therapeutic target prioritization, the formulation of combination therapies, and future drug discovery. This model integration framework is also generalizable to other applications, such as different cell types, organisms, and diseases.
Collapse
Affiliation(s)
- Tiange Shi
- University at Buffalo, Biostatistics, Buffalo, USA
| | - Han Yu
- Roswell Park Comprehensive Cancer Center, Biostatistics and Bioinformatics, Buffalo, USA
| | - Rachael Hageman Blair
- University at Buffalo, Biostatistics, Institute for Artificial Intelligence and Data Science, Buffalo, USA
| |
Collapse
|
7
|
Yu C, Wang J. Data mining and mathematical models in cancer prognosis and prediction. MEDICAL REVIEW (BERLIN, GERMANY) 2022; 2:285-307. [PMID: 37724193 PMCID: PMC10388766 DOI: 10.1515/mr-2021-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/29/2021] [Indexed: 09/20/2023]
Abstract
Cancer is a fetal and complex disease. Individual differences of the same cancer type or the same patient at different stages of cancer development may require distinct treatments. Pathological differences are reflected in tissues, cells and gene levels etc. The interactions between the cancer cells and nearby microenvironments can also influence the cancer progression and metastasis. It is a huge challenge to understand all of these mechanistically and quantitatively. Researchers applied pattern recognition algorithms such as machine learning or data mining to predict cancer types or classifications. With the rapidly growing and available computing powers, researchers begin to integrate huge data sets, multi-dimensional data types and information. The cells are controlled by the gene expressions determined by the promoter sequences and transcription regulators. For example, the changes in the gene expression through these underlying mechanisms can modify cell progressing in the cell-cycle. Such molecular activities can be governed by the gene regulations through the underlying gene regulatory networks, which are essential for cancer study when the information and gene regulations are clear and available. In this review, we briefly introduce several machine learning methods of cancer prediction and classification which include Artificial Neural Networks (ANNs), Decision Trees (DTs), Support Vector Machine (SVM) and naive Bayes. Then we describe a few typical models for building up gene regulatory networks such as Correlation, Regression and Bayes methods based on available data. These methods can help on cancer diagnosis such as susceptibility, recurrence, survival etc. At last, we summarize and compare the modeling methods to analyze the development and progression of cancer through gene regulatory networks. These models can provide possible physical strategies to analyze cancer progression in a systematic and quantitative way.
Collapse
Affiliation(s)
- Chong Yu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
- Department of Statistics, JiLin University of Finance and Economics, Changchun, Jilin Province, China
| | - Jin Wang
- Department of Chemistry and of Physics and Astronomy, State University of New York, Stony Brook, NY, USA
| |
Collapse
|
8
|
Use of Average Mutual Information and Derived Measures to Find Coding Regions. ENTROPY 2021; 23:e23101324. [PMID: 34682048 PMCID: PMC8534840 DOI: 10.3390/e23101324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/09/2021] [Accepted: 09/16/2021] [Indexed: 11/17/2022]
Abstract
One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.
Collapse
|
9
|
Yu CY, Mitrofanova A. Mechanism-Centric Approaches for Biomarker Detection and Precision Therapeutics in Cancer. Front Genet 2021; 12:687813. [PMID: 34408770 PMCID: PMC8365516 DOI: 10.3389/fgene.2021.687813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/28/2021] [Indexed: 12/18/2022] Open
Abstract
Biomarker discovery is at the heart of personalized treatment planning and cancer precision therapeutics, encompassing disease classification and prognosis, prediction of treatment response, and therapeutic targeting. However, many biomarkers represent passenger rather than driver alterations, limiting their utilization as functional units for therapeutic targeting. We suggest that identification of driver biomarkers through mechanism-centric approaches, which take into account upstream and downstream regulatory mechanisms, is fundamental to the discovery of functionally meaningful markers. Here, we examine computational approaches that identify mechanism-centric biomarkers elucidated from gene co-expression networks, regulatory networks (e.g., transcriptional regulation), protein-protein interaction (PPI) networks, and molecular pathways. We discuss their objectives, advantages over gene-centric approaches, and known limitations. Future directions highlight the importance of input and model interpretability, method and data integration, and the role of recently introduced technological advantages, such as single-cell sequencing, which are central for effective biomarker discovery and time-cautious precision therapeutics.
Collapse
Affiliation(s)
- Christina Y. Yu
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
| | - Antonina Mitrofanova
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
- Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, United States
| |
Collapse
|
10
|
Aghamiri SS, Delaplace F. TaBooN Boolean Network Synthesis Based on Tabu Search. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:2499-2511. [PMID: 33661736 DOI: 10.1109/tcbb.2021.3063817] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent developments in Omics-technologies revolutionized the investigation of biology by producing molecular data in multiple dimensions and scale. This breakthrough in biology raises the crucial issue of their interpretation based on modelling. In this undertaking, network provides a suitable framework for modelling the interactions between molecules. Basically a Biological network is composed of nodes referring to the components such as genes or proteins, and the edges/arcs formalizing interactions between them. The evolution of the interactions is then modelled by the definition of a dynamical system. Among the different categories of network, the Boolean network offers a reliable qualitative framework for the modelling. Automatically synthesizing a Boolean network from experimental data therefore remains a necessary but challenging issue. In this study, we present Taboon, an original work-flow for synthesizing Boolean Networks from biological data. The methodology uses the data in the form of boolean profiles for inferring all the potential local formula inference. They combine to form the model space from which the most truthful model with regards to biological knowledge and experiments must be found. In the TaBooN work-flow the selection of the fittest model is achieved by a Tabu-search algorithm. TaBooN is an automated method for Boolean Network inference from.
Collapse
|
11
|
Single-cell network biology for resolving cellular heterogeneity in human diseases. Exp Mol Med 2020; 52:1798-1808. [PMID: 33244151 PMCID: PMC8080824 DOI: 10.1038/s12276-020-00528-0] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/26/2020] [Accepted: 08/31/2020] [Indexed: 01/10/2023] Open
Abstract
Understanding cellular heterogeneity is the holy grail of biology and medicine. Cells harboring identical genomes show a wide variety of behaviors in multicellular organisms. Genetic circuits underlying cell-type identities will facilitate the understanding of the regulatory programs for differentiation and maintenance of distinct cellular states. Such a cell-type-specific gene network can be inferred from coregulatory patterns across individual cells. Conventional methods of transcriptome profiling using tissue samples provide only average signals of diverse cell types. Therefore, reconstructing gene regulatory networks for a particular cell type is not feasible with tissue-based transcriptome data. Recently, single-cell omics technology has emerged and enabled the capture of the transcriptomic landscape of every individual cell. Although single-cell gene expression studies have already opened up new avenues, network biology using single-cell transcriptome data will further accelerate our understanding of cellular heterogeneity. In this review, we provide an overview of single-cell network biology and summarize recent progress in method development for network inference from single-cell RNA sequencing (scRNA-seq) data. Then, we describe how cell-type-specific gene networks can be utilized to study regulatory programs specific to disease-associated cell types and cellular states. Moreover, with scRNA data, modeling personal or patient-specific gene networks is feasible. Therefore, we also introduce potential applications of single-cell network biology for precision medicine. We envision a rapid paradigm shift toward single-cell network analysis for systems biology in the near future. Gene regulatory networks reconstructed from single-cell RNA sequencing datasets are allowing researchers to better understand the molecular circuits and cell states that contribute to complex human disease. Junha Cha and Insuk Lee from Yonsei University in Seoul, South Korea, review the concept of ‘single-cell network biology’, which involves using computational algorithms on genetic expression data from thousands of cells to infer functional interactions in various biological contexts. This systems biology approach to analyzing the profiles of messenger RNA in single cells is helping researchers discover new signaling pathways that could serve as disease biomarkers or therapeutic targets. In the future, patient-specific models of personal gene networks could explain why certain genetic variants affect disease risk. This research could also eventually lead to new types of individualized medical treatments.
Collapse
|
12
|
Khan A, Saha G, Pal RK. Modified Half-System Based Method for Reverse Engineering of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1303-1316. [PMID: 30640623 DOI: 10.1109/tcbb.2019.2892450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The accurate reconstruction of gene regulatory networks for proper understanding of the intricacies of complex biological mechanisms still provides motivation for researchers. Due to accessibility of various gene expression data, we can now attempt to computationally infer genetic interactions. Among the established network inference techniques, S-system is preferred because of its efficiency in replicating biological systems though it is computationally more expensive. This provides motivation for us to develop a similar system with lesser computational load. In this work, we have proposed a novel methodology for reverse engineering of gene regulatory networks based on a new technique: half-system. Half-systems use half the number of parameters compared to S-systems and thus significantly reduce the computational complexity. We have implemented our proposed technique for reconstructing four benchmark networks from their corresponding temporal expression profiles: an 8-gene, a 10-gene, and two 20-gene networks. Being a new technique, to the best of our knowledge, there are no comparable results for this in the contemporary literature. Therefore, we have compared our results with those obtained from the contemporary literature using other methodologies, including the state-of-the-art method, GENIE3. The results obtained in this work stack favourably against the competition, even showing quantifiable improvements in some cases.
Collapse
|
13
|
Anguita-Ruiz A, Segura-Delgado A, Alcalá R, Aguilera CM, Alcalá-Fdez J. eXplainable Artificial Intelligence (XAI) for the identification of biologically relevant gene expression patterns in longitudinal human studies, insights from obesity research. PLoS Comput Biol 2020; 16:e1007792. [PMID: 32275707 PMCID: PMC7176286 DOI: 10.1371/journal.pcbi.1007792] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 04/22/2020] [Accepted: 03/17/2020] [Indexed: 12/18/2022] Open
Abstract
Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on "Black-box" algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.
Collapse
Affiliation(s)
- Augusto Anguita-Ruiz
- Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology "José Mataix", Center of Biomedical Research, University of Granada, Granada, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain
- CIBEROBN (Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Alberto Segura-Delgado
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Rafael Alcalá
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Concepción M. Aguilera
- Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology "José Mataix", Center of Biomedical Research, University of Granada, Granada, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain
- CIBEROBN (Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Jesús Alcalá-Fdez
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
14
|
Yang Y, Fang Q, Shen HB. Predicting gene regulatory interactions based on spatial gene expression data and deep learning. PLoS Comput Biol 2019; 15:e1007324. [PMID: 31527870 PMCID: PMC6764701 DOI: 10.1371/journal.pcbi.1007324] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 09/27/2019] [Accepted: 08/08/2019] [Indexed: 11/23/2022] Open
Abstract
Reverse engineering of gene regulatory networks (GRNs) is a central task in systems biology. Most of the existing methods for GRN inference rely on gene co-expression analysis or TF-target binding information, where the determination of co-expression is often unreliable merely based on gene expression levels, and the TF-target binding data from high-throughput experiments may be noisy, leading to a high ratio of false links and missed links, especially for large-scale networks. In recent years, the microscopy images recording spatial gene expression have become a new resource in GRN reconstruction, as the spatial and temporal expression patterns contain much abundant gene interaction information. Till now, the spatial expression resources have been largely underexploited, and only a few traditional image processing methods have been employed in the image-based GRN reconstruction. Moreover, co-expression analysis using conventional measurements based on image similarity may be inaccurate, because it is the local-pattern consistency rather than global-image-similarity that determines gene-gene interactions. Here we present GripDL (Gene regulatory interaction prediction via Deep Learning), which incorporates high-confidence TF-gene regulation knowledge from previous studies, and constructs GRNs for Drosophila eye development based on Drosophila embryonic gene expression images. Benefitting from the powerful representation ability of deep neural networks and the supervision information of known interactions, the new method outperforms traditional methods with a large margin and reveals new intriguing knowledge about Drosophila eye development.
Collapse
Affiliation(s)
- Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Qingwei Fang
- School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
15
|
Yu H, Blair RH. Integration of probabilistic regulatory networks into constraint-based models of metabolism with applications to Alzheimer's disease. BMC Bioinformatics 2019; 20:386. [PMID: 31291905 PMCID: PMC6617954 DOI: 10.1186/s12859-019-2872-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 05/02/2019] [Indexed: 01/08/2023] Open
Abstract
Background Mathematical models of biological networks can provide important predictions and insights into complex disease. Constraint-based models of cellular metabolism and probabilistic models of gene regulatory networks are two distinct areas that have progressed rapidly in parallel over the past decade. In principle, gene regulatory networks and metabolic networks underly the same complex phenotypes and diseases. However, systematic integration of these two model systems remains a fundamental challenge. Results In this work, we address this challenge by fusing probabilistic models of gene regulatory networks into constraint-based models of metabolism. The novel approach utilizes probabilistic reasoning in BN models of regulatory networks serves as the “glue” that enables a natural interface between the two systems. Probabilistic reasoning is used to predict and quantify system-wide effects of perturbation to the regulatory network in the form of constraints for flux variability analysis. In this setting, both regulatory and metabolic networks inherently account for uncertainty. Applications leverage constraint-based metabolic models of brain metabolism and gene regulatory networks parameterized by gene expression data from the hippocampus to investigate the role of the HIF-1 pathway in Alzheimer’s disease. Integrated models support HIF-1A as effective target to reduce the effects of hypoxia in Alzheimer’s disease. However, HIF-1A activation is far less effective in shifting metabolism when compared to brain metabolism in healthy controls. Conclusions The direct integration of probabilistic regulatory networks into constraint-based models of metabolism provides novel insights into how perturbations in the regulatory network may influence metabolic states. Predictive modeling of enzymatic activity can be facilitated using probabilistic reasoning, thereby extending the predictive capacity of the network. This framework for model integration is generalizable to other systems. Electronic supplementary material The online version of this article (10.1186/s12859-019-2872-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Han Yu
- State University of New York at Buffalo, 3435 Main Street, Buffalo, 14214, US
| | | |
Collapse
|
16
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
17
|
Pacini C, Koziol MJ. Bioinformatics challenges and perspectives when studying the effect of epigenetic modifications on alternative splicing. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0073. [PMID: 29685977 PMCID: PMC5915717 DOI: 10.1098/rstb.2017.0073] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 02/07/2023] Open
Abstract
It is widely known that epigenetic modifications are important in regulating transcription, but several have also been reported in alternative splicing. The regulation of pre-mRNA splicing is important to explain proteomic diversity and the misregulation of splicing has been implicated in many diseases. Here, we give a brief overview of the role of epigenetics in alternative splicing and disease. We then discuss the bioinformatics methods that can be used to model interactions between epigenetic marks and regulators of splicing. These models can be used to identify alternative splicing and epigenetic changes across different phenotypes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK .,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
18
|
Abbaszadeh O, Khanteymoori AR, Azarpeyvand A. Parallel Algorithms for Inferring Gene Regulatory Networks: A Review. Curr Genomics 2018; 19:603-614. [PMID: 30386172 PMCID: PMC6194435 DOI: 10.2174/1389202919666180601081718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 02/20/2018] [Accepted: 05/22/2018] [Indexed: 11/22/2022] Open
Abstract
System biology problems such as whole-genome network construction from large-scale gene expression data are sophisticated and time-consuming. Therefore, using sequential algorithms are not feasible to obtain a solution in an acceptable amount of time. Today, by using massively parallel computing, it is possible to infer large-scale gene regulatory networks. Recently, establishing gene regulatory networks from large-scale datasets have drawn the noticeable attention of researchers in the field of parallel computing and system biology. In this paper, we attempt to provide a more detailed overview of the recent parallel algorithms for constructing gene regulatory networks. Firstly, fundamentals of gene regulatory networks inference and large-scale datasets challenges are given. Secondly, a detailed description of the four parallel frameworks and libraries including CUDA, OpenMP, MPI, and Hadoop is discussed. Thirdly, parallel algorithms are reviewed. Finally, some conclusions and guidelines for parallel reverse engineering are described.
Collapse
Affiliation(s)
- Omid Abbaszadeh
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| | - Ali Azarpeyvand
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| |
Collapse
|
19
|
Huws SA, Creevey CJ, Oyama LB, Mizrahi I, Denman SE, Popova M, Muñoz-Tamayo R, Forano E, Waters SM, Hess M, Tapio I, Smidt H, Krizsan SJ, Yáñez-Ruiz DR, Belanche A, Guan L, Gruninger RJ, McAllister TA, Newbold CJ, Roehe R, Dewhurst RJ, Snelling TJ, Watson M, Suen G, Hart EH, Kingston-Smith AH, Scollan ND, do Prado RM, Pilau EJ, Mantovani HC, Attwood GT, Edwards JE, McEwan NR, Morrisson S, Mayorga OL, Elliott C, Morgavi DP. Addressing Global Ruminant Agricultural Challenges Through Understanding the Rumen Microbiome: Past, Present, and Future. Front Microbiol 2018; 9:2161. [PMID: 30319557 PMCID: PMC6167468 DOI: 10.3389/fmicb.2018.02161] [Citation(s) in RCA: 198] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 08/23/2018] [Indexed: 12/24/2022] Open
Abstract
The rumen is a complex ecosystem composed of anaerobic bacteria, protozoa, fungi, methanogenic archaea and phages. These microbes interact closely to breakdown plant material that cannot be digested by humans, whilst providing metabolic energy to the host and, in the case of archaea, producing methane. Consequently, ruminants produce meat and milk, which are rich in high-quality protein, vitamins and minerals, and therefore contribute to food security. As the world population is predicted to reach approximately 9.7 billion by 2050, an increase in ruminant production to satisfy global protein demand is necessary, despite limited land availability, and whilst ensuring environmental impact is minimized. Although challenging, these goals can be met, but depend on our understanding of the rumen microbiome. Attempts to manipulate the rumen microbiome to benefit global agricultural challenges have been ongoing for decades with limited success, mostly due to the lack of a detailed understanding of this microbiome and our limited ability to culture most of these microbes outside the rumen. The potential to manipulate the rumen microbiome and meet global livestock challenges through animal breeding and introduction of dietary interventions during early life have recently emerged as promising new technologies. Our inability to phenotype ruminants in a high-throughput manner has also hampered progress, although the recent increase in “omic” data may allow further development of mathematical models and rumen microbial gene biomarkers as proxies. Advances in computational tools, high-throughput sequencing technologies and cultivation-independent “omics” approaches continue to revolutionize our understanding of the rumen microbiome. This will ultimately provide the knowledge framework needed to solve current and future ruminant livestock challenges.
Collapse
Affiliation(s)
- Sharon A Huws
- Institute for Global Food Security, Queen's University of Belfast, Belfast, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, Queen's University of Belfast, Belfast, United Kingdom
| | - Linda B Oyama
- Institute for Global Food Security, Queen's University of Belfast, Belfast, United Kingdom
| | - Itzhak Mizrahi
- Department of Life Sciences and the National Institute for Biotechnology in the Negev, Ben Gurion University of the Negev, Beer Sheva, Israel
| | - Stuart E Denman
- Commonwealth Scientific and Industrial Research Organisation Agriculture and Food, Queensland Bioscience Precinct, St Lucia, QLD, Australia
| | - Milka Popova
- Institute National de la Recherche Agronomique, UMR1213 Herbivores, Clermont Université, VetAgro Sup, UMR Herbivores, Clermont-Ferrand, France
| | - Rafael Muñoz-Tamayo
- UMR Modélisation Systémique Appliquée aux Ruminants, INRA, AgroParisTech, Université Paris-Saclay, Paris, France
| | - Evelyne Forano
- UMR 454 MEDIS, INRA, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Sinead M Waters
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Grange, Ireland
| | - Matthias Hess
- College of Agricultural and Environmental Sciences, University of California, Davis, Davis, CA, United States
| | - Ilma Tapio
- Natural Resources Institute Finland, Jokioinen, Finland
| | - Hauke Smidt
- Department of Agrotechnology and Food Sciences, Wageningen, Netherlands
| | - Sophie J Krizsan
- Department of Agricultural Research for Northern Sweden, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - David R Yáñez-Ruiz
- Estacion Experimental del Zaidin, Consejo Superior de Investigaciones Cientificas, Granada, Spain
| | - Alejandro Belanche
- Estacion Experimental del Zaidin, Consejo Superior de Investigaciones Cientificas, Granada, Spain
| | - Leluo Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Robert J Gruninger
- Lethbridge Research Centre, Agriculture and Agri-Food Canada, Lethbridge, AB, Canada
| | - Tim A McAllister
- Lethbridge Research Centre, Agriculture and Agri-Food Canada, Lethbridge, AB, Canada
| | | | - Rainer Roehe
- Scotland's Rural College, Edinburgh, United Kingdom
| | | | - Tim J Snelling
- The Rowett Institute, University of Aberdeen, Aberdeen, United Kingdom
| | - Mick Watson
- The Roslin Institute and the Royal (Dick) School of Veterinary Studies (R(D)SVS), University of Edinburgh, Edinburgh, United Kingdom
| | - Garret Suen
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, United States
| | - Elizabeth H Hart
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| | - Alison H Kingston-Smith
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| | - Nigel D Scollan
- Institute for Global Food Security, Queen's University of Belfast, Belfast, United Kingdom
| | - Rodolpho M do Prado
- Laboratório de Biomoléculas e Espectrometria de Massas-Labiomass, Departamento de Química, Universidade Estadual de Maringá, Maringá, Brazil
| | - Eduardo J Pilau
- Laboratório de Biomoléculas e Espectrometria de Massas-Labiomass, Departamento de Química, Universidade Estadual de Maringá, Maringá, Brazil
| | | | - Graeme T Attwood
- AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand
| | - Joan E Edwards
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, Netherlands
| | - Neil R McEwan
- School of Pharmacy and Life Sciences, Robert Gordon University, Aberdeen, United Kingdom
| | - Steven Morrisson
- Sustainable Livestock, Agri-Food and Bio-Sciences Institute, Hillsborough, United Kingdom
| | - Olga L Mayorga
- Colombian Agricultural Research Corporation, Mosquera, Colombia
| | - Christopher Elliott
- Institute for Global Food Security, Queen's University of Belfast, Belfast, United Kingdom
| | - Diego P Morgavi
- Institute National de la Recherche Agronomique, UMR1213 Herbivores, Clermont Université, VetAgro Sup, UMR Herbivores, Clermont-Ferrand, France
| |
Collapse
|
20
|
Zamanighomi M, Zamanian M, Kimber M, Wang Z. Gene Regulatory Network Inference from Perturbed Time-Series Expression Data via Ordered Dynamical Expansion of Non-Steady State Actors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1093-1106. [PMID: 26701893 DOI: 10.1109/tcbb.2015.2509992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The reconstruction of gene regulatory networks from gene expression data has been the subject of intense research activity. A variety of models and methods have been developed to address different aspects of this important problem. However, these techniques are narrowly focused on particular biological and experimental platforms, and require experimental data that are typically unavailable and difficult to ascertain. The more recent availability of higher-throughput sequencing platforms, combined with more precise modes of genetic perturbation, presents an opportunity to formulate more robust and comprehensive approaches to gene network inference. Here, we propose a step-wise framework for identifying gene-gene regulatory interactions that expand from a known point of genetic or chemical perturbation using time series gene expression data. This novel approach sequentially identifies non-steady state genes post-perturbation and incorporates them into a growing series of low-complexity optimization problems. The governing ordinary differential equations of this model are rooted in the biophysics of stochastic molecular events that underlie gene regulation, delineating roles for both protein and RNA-mediated gene regulation. We show the successful application of our core algorithms for network inference using simulated and real datasets.
Collapse
|
21
|
García-Calvo R, Guisado JL, Diaz-del-Rio F, Córdoba A, Jiménez-Morales F. Graphics Processing Unit-Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks. Evol Bioinform Online 2018; 14:1176934318767889. [PMID: 29662297 PMCID: PMC5898668 DOI: 10.1177/1176934318767889] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 02/28/2018] [Indexed: 12/12/2022] Open
Abstract
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs).
Collapse
Affiliation(s)
- Raúl García-Calvo
- Department of Computer Architecture and Technology, University of Seville, Seville, Spain
| | - JL Guisado
- Department of Computer Architecture and Technology, University of Seville, Seville, Spain
| | - Fernando Diaz-del-Rio
- Department of Computer Architecture and Technology, University of Seville, Seville, Spain
| | - Antonio Córdoba
- Department of Condensed Matter Physics, University of Seville, Seville, Spain
| | | |
Collapse
|
22
|
Matsumoto H, Kiryu H, Furusawa C, Ko MSH, Ko SBH, Gouda N, Hayashi T, Nikaido I. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics 2018; 33:2314-2321. [PMID: 28379368 PMCID: PMC5860123 DOI: 10.1093/bioinformatics/btx194] [Citation(s) in RCA: 241] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 04/02/2017] [Indexed: 01/17/2023] Open
Abstract
Motivation The analysis of RNA-Seq data from individual differentiating cells enables us to reconstruct the differentiation process and the degree of differentiation (in pseudo-time) of each cell. Such analyses can reveal detailed expression dynamics and functional relationships for differentiation. To further elucidate differentiation processes, more insight into gene regulatory networks is required. The pseudo-time can be regarded as time information and, therefore, single-cell RNA-Seq data are time-course data with high time resolution. Although time-course data are useful for inferring networks, conventional inference algorithms for such data suffer from high time complexity when the number of samples and genes is large. Therefore, a novel algorithm is necessary to infer networks from single-cell RNA-Seq during differentiation. Results In this study, we developed the novel and efficient algorithm SCODE to infer regulatory networks, based on ordinary differential equations. We applied SCODE to three single-cell RNA-Seq datasets and confirmed that SCODE can reconstruct observed expression dynamics. We evaluated SCODE by comparing its inferred networks with use of a DNaseI-footprint based network. The performance of SCODE was best for two of the datasets and nearly best for the remaining dataset. We also compared the runtimes and showed that the runtimes for SCODE are significantly shorter than for alternatives. Thus, our algorithm provides a promising approach for further single-cell differentiation analyses. Availability and Implementation The R source code of SCODE is available at https://github.com/hmatsu1226/SCODE Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- Bioinformatics Research Unit, Advanced Center for Computing and Communication, RIKEN, Wako, Saitama 351-0198, Japan
| | - Hisanori Kiryu
- Department of Computational Biology and Medical Sciences, Faculty of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Chikara Furusawa
- Quantitative Biology Center (QBiC), RIKEN, Suita, Osaka 565-0874, Japan.,Universal Biology Institute, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Minoru S H Ko
- Department of Systems Medicine, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Shigeru B H Ko
- Department of Systems Medicine, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Norio Gouda
- Department of Systems Medicine, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Tetsutaro Hayashi
- Bioinformatics Research Unit, Advanced Center for Computing and Communication, RIKEN, Wako, Saitama 351-0198, Japan
| | - Itoshi Nikaido
- Bioinformatics Research Unit, Advanced Center for Computing and Communication, RIKEN, Wako, Saitama 351-0198, Japan
| |
Collapse
|
23
|
Inference and interrogation of a coregulatory network in the context of lipid accumulation in Yarrowia lipolytica. NPJ Syst Biol Appl 2017; 3:21. [PMID: 28955503 PMCID: PMC5554221 DOI: 10.1038/s41540-017-0024-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 07/07/2017] [Accepted: 07/13/2017] [Indexed: 12/14/2022] Open
Abstract
Complex phenotypes, such as lipid accumulation, result from cooperativity between regulators and the integration of multiscale information. However, the elucidation of such regulatory programs by experimental approaches may be challenging, particularly in context-specific conditions. In particular, we know very little about the regulators of lipid accumulation in the oleaginous yeast of industrial interest Yarrowia lipolytica. This lack of knowledge limits the development of this yeast as an industrial platform, due to the time-consuming and costly laboratory efforts required to design strains with the desired phenotypes. In this study, we aimed to identify context-specific regulators and mechanisms, to guide explorations of the regulation of lipid accumulation in Y. lipolytica. Using gene regulatory network inference, and considering the expression of 6539 genes over 26 time points from GSE35447 for biolipid production and a list of 151 transcription factors, we reconstructed a gene regulatory network comprising 111 transcription factors, 4451 target genes and 17048 regulatory interactions (YL-GRN-1) supported by evidence of protein-protein interactions. This study, based on network interrogation and wet laboratory validation (a) highlights the relevance of our proposed measure, the transcription factors influence, for identifying phases corresponding to changes in physiological state without prior knowledge (b) suggests new potential regulators and drivers of lipid accumulation and
Collapse
|
24
|
Kordmahalleh MM, Sefidmazgi MG, Harrison SH, Homaifar A. Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network. BioData Min 2017; 10:29. [PMID: 28785315 PMCID: PMC5543747 DOI: 10.1186/s13040-017-0146-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 07/14/2017] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Powerful biotechnologies have been rapidly and successfully measuring levels of genetic expression to illuminate different states of biological systems. This has led to an ensuing challenge to improve the identification of specific regulatory mechanisms through regulatory network reconstructions. Solutions to this challenge will ultimately help to spur forward efforts based on the usage of regulatory network reconstructions in systems biology applications. METHODS We have developed a hierarchical recurrent neural network (HRNN) that identifies time-delayed gene interactions using time-course data. A customized genetic algorithm (GA) was used to optimize hierarchical connectivity of regulatory genes and a target gene. The proposed design provides a non-fully connected network with the flexibility of using recurrent connections inside the network. These features and the non-linearity of the HRNN facilitate the process of identifying temporal patterns of a GRN. RESULTS Our HRNN method was implemented with the Python language. It was first evaluated on simulated data representing linear and nonlinear time-delayed gene-gene interaction models across a range of network sizes and variances of noise. We then further demonstrated the capability of our method in reconstructing GRNs of the Saccharomyces cerevisiae synthetic network for in vivo benchmarking of reverse-engineering and modeling approaches (IRMA). We compared the performance of our method to TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet across different network sizes and levels of stochastic noise. We found our HRNN method to be superior in terms of accuracy for nonlinear data sets with higher amounts of noise. CONCLUSIONS The proposed method identifies time-delayed gene-gene interactions of GRNs. The topology-based advancement of our HRNN worked as expected by more effectively modeling nonlinear data sets. As a non-fully connected network, an added benefit to HRNN was how it helped to find the few genes which regulated the target gene over different time delays.
Collapse
Affiliation(s)
- Mina Moradi Kordmahalleh
- Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA
| | - Mohammad Gorji Sefidmazgi
- Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA
| | - Scott H Harrison
- Department of Biology, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA
| | - Abdollah Homaifar
- Department of Electrical and Computer Engineering, North Carolina A&T State University, 1601 E. Market Street, Greensboro, 27411 NC USA
| |
Collapse
|
25
|
Ghanat Bari M, Ung CY, Zhang C, Zhu S, Li H. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks. Sci Rep 2017; 7:6993. [PMID: 28765560 PMCID: PMC5539301 DOI: 10.1038/s41598-017-07481-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/27/2017] [Indexed: 12/25/2022] Open
Abstract
Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 108 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Choong Yong Ung
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Cheng Zhang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shizhen Zhu
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Hu Li
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA.
| |
Collapse
|
26
|
Huang B, Lu M, Jia D, Ben-Jacob E, Levine H, Onuchic JN. Interrogating the topological robustness of gene regulatory circuits by randomization. PLoS Comput Biol 2017; 13:e1005456. [PMID: 28362798 PMCID: PMC5391964 DOI: 10.1371/journal.pcbi.1005456] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 04/14/2017] [Accepted: 03/15/2017] [Indexed: 01/06/2023] Open
Abstract
One of the most important roles of cells is performing their cellular tasks properly for survival. Cells usually achieve robust functionality, for example, cell-fate decision-making and signal transduction, through multiple layers of regulation involving many genes. Despite the combinatorial complexity of gene regulation, its quantitative behavior has been typically studied on the basis of experimentally verified core gene regulatory circuitry, composed of a small set of important elements. It is still unclear how such a core circuit operates in the presence of many other regulatory molecules and in a crowded and noisy cellular environment. Here we report a new computational method, named random circuit perturbation (RACIPE), for interrogating the robust dynamical behavior of a gene regulatory circuit even without accurate measurements of circuit kinetic parameters. RACIPE generates an ensemble of random kinetic models corresponding to a fixed circuit topology, and utilizes statistical tools to identify generic properties of the circuit. By applying RACIPE to simple toggle-switch-like motifs, we observed that the stable states of all models converge to experimentally observed gene state clusters even when the parameters are strongly perturbed. RACIPE was further applied to a proposed 22-gene network of the Epithelial-to-Mesenchymal Transition (EMT), from which we identified four experimentally observed gene states, including the states that are associated with two different types of hybrid Epithelial/Mesenchymal phenotypes. Our results suggest that dynamics of a gene circuit is mainly determined by its topology, not by detailed circuit parameters. Our work provides a theoretical foundation for circuit-based systems biology modeling. We anticipate RACIPE to be a powerful tool to predict and decode circuit design principles in an unbiased manner, and to quantitatively evaluate the robustness and heterogeneity of gene expression. Cells are able to robustly carry out their essential biological functions, possibly because of multiple layers of tight regulation via complex, yet well-designed, gene regulatory networks involving a substantial number of genes. State-of-the-art genomics technology has enabled the mapping of these large gene networks, yet it remains a tremendous challenge to elucidate their design principles and the regulatory mechanisms underlying their biological functions such as signal processing and decision-making. One of the key barriers is the absence of accurate kinetics for the regulatory interactions, especially from in vivo experiments. To this end, we have developed a new computational modeling method, Random Circuit Perturbation (RACIPE), to explore the dynamic behaviors of gene regulatory circuits without the requirement of detailed kinetic parameters. RACIPE takes a network topology as the input, and generates an unbiased ensemble of models with varying kinetic parameters. Each model is subjected to simulation, followed by statistical analysis for the ensemble. We tested RACIPE on several gene circuits, and found that the predicted gene expression patterns from all of the models converge to experimentally observed gene state clusters. We expect RACIPE to be a powerful method to identify the role of network topology in determining network operating principles.
Collapse
Affiliation(s)
- Bin Huang
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- Department of Chemistry, Rice University, Houston, TX, United States of America
| | - Mingyang Lu
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| | - Dongya Jia
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- Program in Systems, Synthetic and Physical Biology, Rice University, Houston, TX, United States of America
| | - Eshel Ben-Jacob
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- School of Physics and Astronomy, and The Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, Israel
| | - Herbert Levine
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- Department of Bioengineering, Rice University, Houston, TX, United States of America
- Department of Biosciences, Rice University, Houston, TX, United States of America
- Department of Physics and Astronomy, Rice University, Houston, TX, United States of America
- * E-mail: (HL); (JNO)
| | - Jose N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States of America
- Department of Chemistry, Rice University, Houston, TX, United States of America
- Department of Biosciences, Rice University, Houston, TX, United States of America
- Department of Physics and Astronomy, Rice University, Houston, TX, United States of America
- * E-mail: (HL); (JNO)
| |
Collapse
|
27
|
Koda S, Onda Y, Matsui H, Takahagi K, Uehara-Yamaguchi Y, Shimizu M, Inoue K, Yoshida T, Sakurai T, Honda H, Eguchi S, Nishii R, Mochida K. Diurnal Transcriptome and Gene Network Represented through Sparse Modeling in Brachypodium distachyon. FRONTIERS IN PLANT SCIENCE 2017; 8:2055. [PMID: 29234348 PMCID: PMC5712366 DOI: 10.3389/fpls.2017.02055] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 11/16/2017] [Indexed: 05/08/2023]
Abstract
We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX) model with a group smoothly clipped absolute deviation (SCAD) method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon. To reveal the diurnal changes in the transcriptome in B. distachyon, we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon. On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon, aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.
Collapse
Affiliation(s)
- Satoru Koda
- Graduate School of Mathematics, Kyushu University, Fukuoka, Japan
| | - Yoshihiko Onda
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | | | - Kotaro Takahagi
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
- Kihara Institute for Biological Research, Yokohama City University, Kanagawa, Japan
| | - Yukiko Uehara-Yamaguchi
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Minami Shimizu
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Komaki Inoue
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Takuhiro Yoshida
- Integrated Genome Informatics Research Unit, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
| | - Tetsuya Sakurai
- Integrated Genome Informatics Research Unit, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
- Research and Education Faculty, Multidisciplinary Science Cluster, Interdisciplinary Science Unit, Kochi University, Kochi, Japan
| | - Hiroshi Honda
- Graduate School of Mathematics, Kyushu University, Fukuoka, Japan
| | - Shinto Eguchi
- The Institute of Statistical Mathematics, Tokyo, Japan
| | - Ryuei Nishii
- Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan
- *Correspondence: Keiichi Mochida, Ryuei Nishii,
| | - Keiichi Mochida
- Cellulose Production Research Team, Biomass Engineering Research Division, RIKEN Center for Sustainable Resource Science, Kanagawa, Japan
- Kihara Institute for Biological Research, Yokohama City University, Kanagawa, Japan
- Institute of Plant Science and Resources, Okayama University, Okayama, Japan
- *Correspondence: Keiichi Mochida, Ryuei Nishii,
| |
Collapse
|
28
|
Guo S, Jiang Q, Chen L, Guo D. Gene regulatory network inference using PLS-based methods. BMC Bioinformatics 2016; 17:545. [PMID: 28031031 PMCID: PMC5192600 DOI: 10.1186/s12859-016-1398-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 12/01/2016] [Indexed: 12/12/2022] Open
Abstract
Background Inferring the topology of gene regulatory networks (GRNs) from microarray gene expression data has many potential applications, such as identifying candidate drug targets and providing valuable insights into the biological processes. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. Results We introduce an ensemble gene regulatory network inference method PLSNET, which decomposes the GRN inference problem with p genes into p subproblems and solves each of the subproblems by using Partial least squares (PLS) based feature selection algorithm. Then, a statistical technique is used to refine the predictions in our method. The proposed method was evaluated on the DREAM4 and DREAM5 benchmark datasets and achieved higher accuracy than the winners of those competitions and other state-of-the-art GRN inference methods. Conclusions Superior accuracy achieved on different benchmark datasets, including both in silico and in vivo networks, shows that PLSNET reaches state-of-the-art performance. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1398-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shun Guo
- Department of Electronic Engineering, Xiamen University, Fujian, 361005, China.,Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Qingshan Jiang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117, China
| | - Donghui Guo
- Department of Electronic Engineering, Xiamen University, Fujian, 361005, China.
| |
Collapse
|
29
|
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2016; 13:1241-1251. [PMID: 27775378 DOI: 10.3934/mbe.2016041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.
Collapse
Affiliation(s)
- William Chad Young
- University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, United States.
| | | | | |
Collapse
|
30
|
Sulaimanov N, Koeppl H. Graph reconstruction using covariance-based methods. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:19. [PMID: 27942259 PMCID: PMC5121191 DOI: 10.1186/s13637-016-0052-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Accepted: 10/21/2016] [Indexed: 11/10/2022]
Abstract
Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.
Collapse
Affiliation(s)
- Nurgazy Sulaimanov
- Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Rundeturmstr. 12, Darmstadt, 64283 Germany
- Department of Biology, Technische Universität Darmstadt, Schnittspahnstr. 10, Darmstadt, 64287 Germany
| | - Heinz Koeppl
- Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Rundeturmstr. 12, Darmstadt, 64283 Germany
- Department of Biology, Technische Universität Darmstadt, Schnittspahnstr. 10, Darmstadt, 64287 Germany
| |
Collapse
|
31
|
Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy. PLoS One 2016; 11:e0166115. [PMID: 27829000 PMCID: PMC5102470 DOI: 10.1371/journal.pone.0166115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 10/24/2016] [Indexed: 12/18/2022] Open
Abstract
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the "large p, small n" problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.
Collapse
|
32
|
Chockalingam S, Aluru M, Aluru S. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories. MICROARRAYS 2016; 5:microarrays5030023. [PMID: 27657141 PMCID: PMC5040970 DOI: 10.3390/microarrays5030023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 09/06/2016] [Accepted: 09/13/2016] [Indexed: 11/16/2022]
Abstract
Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.
Collapse
Affiliation(s)
- Sriram Chockalingam
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai 40076, India.
| | - Maneesha Aluru
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|
33
|
Clarkson MD. Representation of anatomy in online atlases and databases: a survey and collection of patterns for interface design. BMC DEVELOPMENTAL BIOLOGY 2016; 16:18. [PMID: 27206491 PMCID: PMC4875762 DOI: 10.1186/s12861-016-0116-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/09/2016] [Indexed: 12/17/2022]
Abstract
BACKGROUND A large number of online atlases and databases have been developed to mange the rapidly growing amount of data describing embryogenesis. As these community resources continue to evolve, it is important to understand how representations of anatomy can facilitate the sharing and integration of data. In addition, attention to the design of the interfaces is critical to make online resources useful and usable. RESULTS I first present a survey of online atlases and gene expression resources for model organisms, with a focus on methods of semantic and spatial representation of anatomy. A total of 14 anatomical atlases and 21 gene expression resources are included. This survey demonstrates how choices in semantic representation, in the form of ontologies, can enhance interface search functions and provide links between relevant information. This survey also reviews methods for spatially representing anatomy in online resources. I then provide a collection of patterns for interface design based on the atlases and databases surveyed. These patterns include methods for displaying graphics, integrating semantic and spatial representations, organizing information, and querying databases to find genes expressed in anatomical structures. CONCLUSIONS This collection of patterns for interface design will assist biologists and software developers in planning the interfaces of new atlases and databases or enhancing existing ones. They also show the benefits of standardizing semantic and spatial representations of anatomy by demonstrating how interfaces can use standardization to provide enhanced functionality.
Collapse
Affiliation(s)
- Melissa D Clarkson
- Department of Biological Structure, School of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
34
|
Neural model of gene regulatory network: a survey on supportive meta-heuristics. Theory Biosci 2016; 135:1-19. [DOI: 10.1007/s12064-016-0224-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 03/21/2016] [Indexed: 12/21/2022]
|
35
|
Hou J, Acharya L, Zhu D, Cheng J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 2016; 15:95-108. [PMID: 26476430 PMCID: PMC5065356 DOI: 10.1093/bfgp/elv040] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed.
Collapse
|
36
|
Petralia F, Song WM, Tu Z, Wang P. New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer. J Proteome Res 2016; 15:743-54. [PMID: 26733076 PMCID: PMC4782177 DOI: 10.1021/acs.jproteome.5b00925] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
We focus on characterizing
common and different coexpression patterns
among RNAs and proteins in breast cancer tumors. To address this problem,
we introduce Joint Random Forest (JRF), a novel nonparametric algorithm
to simultaneously estimate multiple coexpression networks by effectively
borrowing information across protein and gene expression data. The
performance of JRF was evaluated through extensive simulation studies
using different network topologies and data distribution functions.
Advantages of JRF over other algorithms that estimate class-specific
networks separately were observed across all simulation settings.
JRF also outperformed a competing method based on Gaussian graphic
models. We then applied JRF to simultaneously construct gene and protein
coexpression networks based on protein and RNAseq data from CPTAC-TCGA
breast cancer study. We identified interesting common and differential
coexpression patterns among genes and proteins. This information can
help to cast light on the potential disease mechanisms of breast cancer.
Collapse
Affiliation(s)
- Francesca Petralia
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , 770 Lexington Avenue, 14th Floor, NewYork, New York 10065, United States
| | - Won-Min Song
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , 770 Lexington Avenue, 14th Floor, NewYork, New York 10065, United States
| | - Zhidong Tu
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , 770 Lexington Avenue, 14th Floor, NewYork, New York 10065, United States
| | - Pei Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , 770 Lexington Avenue, 14th Floor, NewYork, New York 10065, United States
| |
Collapse
|
37
|
Hsiao YT, Lee WP, Yang W, Müller S, Flamm C, Hofacker I, Kügler P. Practical Guidelines for Incorporating Knowledge-Based and Data-Driven Strategies into the Inference of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:64-75. [PMID: 26441429 DOI: 10.1109/tcbb.2015.2465954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Modeling gene regulatory networks (GRNs) is essential for conceptualizing how genes are expressed and how they influence each other. Typically, a reverse engineering approach is employed; this strategy is effective in reproducing possible fitting models of GRNs. To use this strategy, however, two daunting tasks must be undertaken: one task is to optimize the accuracy of inferred network behaviors; and the other task is to designate valid biological topologies for target networks. Although existing studies have addressed these two tasks for years, few of the studies can satisfy both of the requirements simultaneously. To address these difficulties, we propose an integrative modeling framework that combines knowledge-based and data-driven input sources to construct biological topologies with their corresponding network behaviors. To validate the proposed approach, a real dataset collected from the cell cycle of the yeast S. cerevisiae is used. The results show that the proposed framework can successfully infer solutions that meet the requirements of both the network behaviors and biological structures. Therefore, the outcomes are exploitable for future in vivo experimental design.
Collapse
|
38
|
Nepomuceno-Chamorro IA, Marquez-Chamorro A, Aguilar-Ruiz JS. Building Transcriptional Association Networks in Cytoscape with RegNetC. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:823-824. [PMID: 26357322 DOI: 10.1109/tcbb.2014.2385702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The Regression Network plugin for Cytoscape (RegNetC) implements the RegNet algorithm for the inference of transcriptional association network from gene expression profiles. This algorithm is a model tree-based method to detect the relationship between each gene and the remaining genes simultaneously instead of analyzing individually each pair of genes as correlation-based methods do. Model trees are a very useful technique to estimate the gene expression value by regression models and favours localized similarities over more global similarity, which is one of the major drawbacks of correlation-based methods. Here, we present an integrated software suite, named RegNetC, as a Cytoscape plugin that can operate on its own as well. RegNetC facilitates, according to user-defined parameters, the resulted transcriptional gene association network in .sif format for visualization, analysis and interoperates with other Cytoscape plugins, which can be exported for publication figures. In addition to the network, the RegNetC plugin also provides the quantitative relationships between genes expression values of those genes involved in the inferred network, i.e., those defined by the regression models.
Collapse
|
39
|
Evans TG. Considerations for the use of transcriptomics in identifying the ‘genes that matter’ for environmental adaptation. J Exp Biol 2015; 218:1925-35. [DOI: 10.1242/jeb.114306] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
ABSTRACT
Transcriptomics has emerged as a powerful approach for exploring physiological responses to the environment. However, like any other experimental approach, transcriptomics has its limitations. Transcriptomics has been criticized as an inappropriate method to identify genes with large impacts on adaptive responses to the environment because: (1) genes with large impacts on fitness are rare; (2) a large change in gene expression does not necessarily equate to a large effect on fitness; and (3) protein activity is most relevant to fitness, and mRNA abundance is an unreliable indicator of protein activity. In this review, these criticisms are re-evaluated in the context of recent systems-level experiments that provide new insight into the relationship between gene expression and fitness during environmental stress. In general, these criticisms remain valid today, and indicate that exclusively using transcriptomics to screen for genes that underlie environmental adaptation will overlook constitutively expressed regulatory genes that play major roles in setting tolerance limits. Standard practices in transcriptomic data analysis pipelines may also be limiting insight by prioritizing highly differentially expressed and conserved genes over those genes that undergo moderate fold-changes and cannot be annotated. While these data certainly do not undermine the continued and widespread use of transcriptomics within environmental physiology, they do highlight the types of research questions for which transcriptomics is best suited and the need for more gene functional analyses. Such information is pertinent at a time when transcriptomics has become increasingly tractable and many researchers may be contemplating integrating transcriptomics into their research programs.
Collapse
|
40
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
41
|
Subramanian N, Torabi-Parizi P, Gottschalk RA, Germain RN, Dutta B. Network representations of immune system complexity. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 7:13-38. [PMID: 25625853 PMCID: PMC4339634 DOI: 10.1002/wsbm.1288] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 12/09/2014] [Accepted: 12/11/2014] [Indexed: 12/25/2022]
Abstract
The mammalian immune system is a dynamic multiscale system composed of a hierarchically organized set of molecular, cellular, and organismal networks that act in concert to promote effective host defense. These networks range from those involving gene regulatory and protein–protein interactions underlying intracellular signaling pathways and single‐cell responses to increasingly complex networks of in vivo cellular interaction, positioning, and migration that determine the overall immune response of an organism. Immunity is thus not the product of simple signaling events but rather nonlinear behaviors arising from dynamic, feedback‐regulated interactions among many components. One of the major goals of systems immunology is to quantitatively measure these complex multiscale spatial and temporal interactions, permitting development of computational models that can be used to predict responses to perturbation. Recent technological advances permit collection of comprehensive datasets at multiple molecular and cellular levels, while advances in network biology support representation of the relationships of components at each level as physical or functional interaction networks. The latter facilitate effective visualization of patterns and recognition of emergent properties arising from the many interactions of genes, molecules, and cells of the immune system. We illustrate the power of integrating ‘omics’ and network modeling approaches for unbiased reconstruction of signaling and transcriptional networks with a focus on applications involving the innate immune system. We further discuss future possibilities for reconstruction of increasingly complex cellular‐ and organism‐level networks and development of sophisticated computational tools for prediction of emergent immune behavior arising from the concerted action of these networks. WIREs Syst Biol Med 2015, 7:13–38. doi: 10.1002/wsbm.1288 This article is categorized under:
Analytical and Computational Methods > Computational Methods Laboratory Methods and Technologies > Macromolecular Interactions, Methods
Collapse
Affiliation(s)
- Naeha Subramanian
- Institute for Systems Biology, Seattle, WA, USA; Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | | |
Collapse
|
42
|
Porter JR, Batchelor E. Using computational modeling and experimental synthetic perturbations to probe biological circuits. Methods Mol Biol 2015; 1244:259-76. [PMID: 25487101 PMCID: PMC6311997 DOI: 10.1007/978-1-4939-1878-2_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
This chapter describes approaches for using computational modeling of synthetic biology perturbations to analyze endogenous biological circuits, with a particular focus on signaling and metabolic pathways. We describe a bottom-up approach in which ordinary differential equations are constructed to model the core interactions of a pathway of interest. We then discuss methods for modeling synthetic perturbations that can be used to investigate properties of the natural circuit. Keeping in mind the importance of the interplay between modeling and experimentation, we next describe experimental methods for constructing synthetic perturbations to test the computational predictions. Finally, we present a case study of the p53 tumor-suppressor pathway, illustrating the process of modeling the core network, designing informative synthetic perturbations in silico, and testing the predictions in vivo.
Collapse
Affiliation(s)
- Joshua R. Porter
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Room B1B42, 10 Center Dr., MSC 1500, Bethesda, MD, 20892, USA
| | - Eric Batchelor
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Room B1B42, 10 Center Dr., MSC 1500, Bethesda, MD, 20892, USA
| |
Collapse
|
43
|
Hsiao YT, Lee WP. Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique. BMC Bioinformatics 2014; 15 Suppl 15:S8. [PMID: 25474560 PMCID: PMC4271569 DOI: 10.1186/1471-2105-15-s15-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize the network parameters to minimize the accumulated discrepancy between the gene expression data and the values produced by the inferred network model. Though the above two tasks must be conducted simultaneously, most existing work addresses only one of the tasks. Results We propose an iterative approach that couples parameter identification and parameter optimization techniques, to address the two tasks simultaneously during network inference. This approach first identifies the most influential parameters against internal perturbations; this identification is based on sensitivity measurements. Then, a hybrid GA-PSO optimization method infers parameters in accordance with their criticalities. The proposed approach has been applied to several datasets, including subsets of the SOS DNA repair system in E. coli, the Rat central nervous system (CNS), and the protein glycosylation system of yeast S. cerevisiae. The result and analysis show that our approach can infer solutions to satisfy both the requirements of network structure and network behavior. Conclusions Network structure is an important though challenging issue to address in inferring sophisticated networks with biological details. In need of prior structural knowledge, we turn to measure parameter sensitivity instead to account for the network structure in an indirect way. By developing an integrated approach for considering both the network structure and behavior in the inference process, we can successfully infer critical gene interactions as well as valid time expression profiles.
Collapse
|
44
|
Coneva V, Simopoulos C, Casaretto JA, El-Kereamy A, Guevara DR, Cohn J, Zhu T, Guo L, Alexander DC, Bi YM, McNicholas PD, Rothstein SJ. Metabolic and co-expression network-based analyses associated with nitrate response in rice. BMC Genomics 2014; 15:1056. [PMID: 25471115 PMCID: PMC4301927 DOI: 10.1186/1471-2164-15-1056] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 11/27/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Understanding gene expression and metabolic re-programming that occur in response to limiting nitrogen (N) conditions in crop plants is crucial for the ongoing progress towards the development of varieties with improved nitrogen use efficiency (NUE). To unravel new details on the molecular and metabolic responses to N availability in a major food crop, we conducted analyses on a weighted gene co-expression network and metabolic profile data obtained from leaves and roots of rice plants adapted to sufficient and limiting N as well as after shifting them to limiting (reduction) and sufficient (induction) N conditions. RESULTS A gene co-expression network representing clusters of rice genes with similar expression patterns across four nitrogen conditions and two tissue types was generated. The resulting 18 clusters were analyzed for enrichment of significant gene ontology (GO) terms. Four clusters exhibited significant correlation with limiting and reducing nitrate treatments. Among the identified enriched GO terms, those related to nucleoside/nucleotide, purine and ATP binding, defense response, sugar/carbohydrate binding, protein kinase activities, cell-death and cell wall enzymatic activity are enriched. Although a subset of functional categories are more broadly associated with the response of rice organs to limiting N and N reduction, our analyses suggest that N reduction elicits a response distinguishable from that to adaptation to limiting N, particularly in leaves. This observation is further supported by metabolic profiling which shows that several compounds in leaves change proportionally to the nitrate level (i.e. higher in sufficient N vs. limiting N) and respond with even higher levels when the nitrate level is reduced. Notably, these compounds are directly involved in N assimilation, transport, and storage (glutamine, asparagine, glutamate and allantoin) and extend to most amino acids. Based on these data, we hypothesize that plants respond by rapidly mobilizing stored vacuolar nitrate when N deficit is perceived, and that the response likely involves phosphorylation signal cascades and transcriptional regulation. CONCLUSIONS The co-expression network analysis and metabolic profiling performed in rice pinpoint the relevance of signal transduction components and regulation of N mobilization in response to limiting N conditions and deepen our understanding of N responses and N use in crops.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Steven J Rothstein
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.
| |
Collapse
|
45
|
Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front Cell Dev Biol 2014; 2:38. [PMID: 25364745 PMCID: PMC4207011 DOI: 10.3389/fcell.2014.00038] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 07/29/2014] [Indexed: 11/13/2022] Open
Abstract
In recent years gene regulatory networks (GRNs) have attracted a lot of interest and many methods have been introduced for their statistical inference from gene expression data. However, despite their popularity, GRNs are widely misunderstood. For this reason, we provide in this paper a general discussion and perspective of gene regulatory networks. Specifically, we discuss their meaning, the consistency among different network inference methods, ensemble methods, the assessment of GRNs, the estimated number of existing GRNs and their usage in different application domains. Furthermore, we discuss open questions and necessary steps in order to utilize gene regulatory networks in a clinical context and for personalized medicine.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Faculty of Medicine, Health and Life Sciences, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast Belfast, UK
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Department of Medical Biophysics, Princess Margaret Cancer Centre, University of Toronto Canada
| |
Collapse
|
46
|
Artificial neural network inference (ANNI): a study on gene-gene interaction for biomarkers in childhood sarcomas. PLoS One 2014; 9:e102483. [PMID: 25025207 PMCID: PMC4099183 DOI: 10.1371/journal.pone.0102483] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Accepted: 06/19/2014] [Indexed: 01/31/2023] Open
Abstract
Objective To model the potential interaction between previously identified biomarkers in children sarcomas using artificial neural network inference (ANNI). Method To concisely demonstrate the biological interactions between correlated genes in an interaction network map, only 2 types of sarcomas in the children small round blue cell tumors (SRBCTs) dataset are discussed in this paper. A backpropagation neural network was used to model the potential interaction between genes. The prediction weights and signal directions were used to model the strengths of the interaction signals and the direction of the interaction link between genes. The ANN model was validated using Monte Carlo cross-validation to minimize the risk of over-fitting and to optimize generalization ability of the model. Results Strong connection links on certain genes (TNNT1 and FNDC5 in rhabdomyosarcoma (RMS); FCGRT and OLFM1 in Ewing’s sarcoma (EWS)) suggested their potency as central hubs in the interconnection of genes with different functionalities. The results showed that the RMS patients in this dataset are likely to be congenital and at low risk of cardiomyopathy development. The EWS patients are likely to be complicated by EWS-FLI fusion and deficiency in various signaling pathways, including Wnt, Fas/Rho and intracellular oxygen. Conclusions The ANN network inference approach and the examination of identified genes in the published literature within the context of the disease highlights the substantial influence of certain genes in sarcomas.
Collapse
|
47
|
Bazil JN, Stamm KD, Li X, Thiagarajan R, Nelson TJ, Tomita-Mitchell A, Beard DA. The inferred cardiogenic gene regulatory network in the mammalian heart. PLoS One 2014; 9:e100842. [PMID: 24971943 PMCID: PMC4074065 DOI: 10.1371/journal.pone.0100842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 05/31/2014] [Indexed: 12/22/2022] Open
Abstract
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation.
Collapse
Affiliation(s)
- Jason N. Bazil
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Karl D. Stamm
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Xing Li
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Raghuram Thiagarajan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Timothy J. Nelson
- Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, and Mayo Clinic Center for Regenerative Medicine, Rochester, Minnesota, United States of America
| | - Aoy Tomita-Mitchell
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Daniel A. Beard
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
48
|
Chebil I, Nicolle R, Santini G, Rouveirol C, Elati M. Hybrid Method Inference for the Construction of Cooperative Regulatory Network in Human. IEEE Trans Nanobioscience 2014; 13:97-103. [DOI: 10.1109/tnb.2014.2316920] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
49
|
Henderson J, Michailidis G. Network reconstruction using nonparametric additive ODE models. PLoS One 2014; 9:e94003. [PMID: 24732037 PMCID: PMC3986056 DOI: 10.1371/journal.pone.0094003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/13/2014] [Indexed: 01/05/2023] Open
Abstract
Network representations of biological systems are widespread and reconstructing unknown networks from data is a focal problem for computational biologists. For example, the series of biochemical reactions in a metabolic pathway can be represented as a network, with nodes corresponding to metabolites and edges linking reactants to products. In a different context, regulatory relationships among genes are commonly represented as directed networks with edges pointing from influential genes to their targets. Reconstructing such networks from data is a challenging problem receiving much attention in the literature. There is a particular need for approaches tailored to time-series data and not reliant on direct intervention experiments, as the former are often more readily available. In this paper, we introduce an approach to reconstructing directed networks based on dynamic systems models. Our approach generalizes commonly used ODE models based on linear or nonlinear dynamics by extending the functional class for the functions involved from parametric to nonparametric models. Concomitantly we limit the complexity by imposing an additive structure on the estimated slope functions. Thus the submodel associated with each node is a sum of univariate functions. These univariate component functions form the basis for a novel coupling metric that we define in order to quantify the strength of proposed relationships and hence rank potential edges. We show the utility of the method by reconstructing networks using simulated data from computational models for the glycolytic pathway of Lactocaccus Lactis and a gene network regulating the pluripotency of mouse embryonic stem cells. For purposes of comparison, we also assess reconstruction performance using gene networks from the DREAM challenges. We compare our method to those that similarly rely on dynamic systems models and use the results to attempt to disentangle the distinct roles of linearity, sparsity, and derivative estimation.
Collapse
Affiliation(s)
- James Henderson
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
50
|
Astola L, Stigter H, van Dijk ADJ, van Daelen R, Molenaar J. Inferring the gene network underlying the branching of tomato inflorescence. PLoS One 2014; 9:e89689. [PMID: 24699171 PMCID: PMC3974656 DOI: 10.1371/journal.pone.0089689] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 01/24/2014] [Indexed: 12/21/2022] Open
Abstract
The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.
Collapse
Affiliation(s)
- Laura Astola
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands
- * E-mail:
| | - Hans Stigter
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands
| | - Raymond van Daelen
- Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands
- Keygene N.V., Wageningen, The Netherlands
| | - Jaap Molenaar
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, Amsterdam, The Netherlands
| |
Collapse
|