1
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. Gene regulatory network inference with covariance dynamics. Math Biosci 2024:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, NewYork, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, NewYork, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
2
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
3
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. Biometrics 2024; 80:ujad039. [PMID: 38470257 PMCID: PMC10928990 DOI: 10.1093/biomtc/ujad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/24/2023] [Accepted: 01/04/2024] [Indexed: 03/13/2024]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, IA 50011, United States
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
4
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
5
|
Bernaola N, Michiels M, Larrañaga P, Bielza C. Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks. PLoS Comput Biol 2023; 19:e1011443. [PMID: 38039337 PMCID: PMC10745139 DOI: 10.1371/journal.pcbi.1011443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 12/22/2023] [Accepted: 08/19/2023] [Indexed: 12/03/2023] Open
Abstract
We present the Fast Greedy Equivalence Search (FGES)-Merge, a new method for learning the structure of gene regulatory networks via merging locally learned Bayesian networks, based on the fast greedy equivalent search algorithm. The method is competitive with the state of the art in terms of the Matthews correlation coefficient, which takes into account both precision and recall, while also improving upon it in terms of speed, scaling up to tens of thousands of variables and being able to use empirical knowledge about the topological structure of gene regulatory networks. To showcase the ability of our method to scale to massive networks, we apply it to learning the gene regulatory network for the full human genome using data from samples of different brain structures (from the Allen Human Brain Atlas). Furthermore, this Bayesian network model should predict interactions between genes in a way that is clear to experts, following the current trends in explainable artificial intelligence. To achieve this, we also present a new open-access visualization tool that facilitates the exploration of massive networks and can aid in finding nodes of interest for experimental tests.
Collapse
Affiliation(s)
- Niko Bernaola
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Mario Michiels
- Centro Integral de Neurociencias Abarca Campal, Hospital Universitario HM Puerta del Sur, Madrid, Spain
| | - Pedro Larrañaga
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Concha Bielza
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
6
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.10.528092. [PMID: 38045347 PMCID: PMC10690198 DOI: 10.1101/2023.02.10.528092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available at https://github.com/chunlinli/sumdag.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
7
|
Jiang Z, Chen C, Xu Z, Wang X, Zhang M, Zhang D. SIGNET: transcriptome-wide causal inference for gene regulatory networks. Sci Rep 2023; 13:19371. [PMID: 37938594 PMCID: PMC10632394 DOI: 10.1038/s41598-023-46295-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/30/2023] [Indexed: 11/09/2023] Open
Abstract
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available ( https://www.zstats.org/signet/ ).
Collapse
Affiliation(s)
- Zhongli Jiang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Zhenyu Xu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Min Zhang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA
| | - Dabao Zhang
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA.
| |
Collapse
|
8
|
Liao X, Ozcan M, Shi M, Kim W, Jin H, Li X, Turkez H, Achour A, Uhlén M, Mardinoglu A, Zhang C. Open MoA: revealing the mechanism of action (MoA) based on network topology and hierarchy. Bioinformatics 2023; 39:btad666. [PMID: 37930015 PMCID: PMC10637856 DOI: 10.1093/bioinformatics/btad666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/19/2023] [Accepted: 10/30/2023] [Indexed: 11/07/2023] Open
Abstract
MOTIVATION Many approaches in systems biology have been applied in drug repositioning due to the increased availability of the omics data and computational biology tools. Using a multi-omics integrated network, which contains information of various biological interactions, could offer a more comprehensive inspective and interpretation for the drug mechanism of action (MoA). RESULTS We developed a computational pipeline for dissecting the hidden MoAs of drugs (Open MoA). Our pipeline computes confidence scores to edges that represent connections between genes/proteins in the integrated network. The interactions showing the highest confidence score could indicate potential drug targets and infer the underlying molecular MoAs. Open MoA was also validated by testing some well-established targets. Additionally, we applied Open MoA to reveal the MoA of a repositioned drug (JNK-IN-5A) that modulates the PKLR expression in HepG2 cells and found STAT1 is the key transcription factor. Overall, Open MoA represents a first-generation tool that could be utilized for predicting the potential MoA of repurposed drugs and dissecting de novo targets for developing effective treatments. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/XinmengLiao/Open_MoA.
Collapse
Affiliation(s)
- Xinmeng Liao
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| | - Mehmet Ozcan
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
- Department of Medical Biochemistry, Faculty of Medicine, Zonguldak Bulent Ecevit University, 67630 Zonguldak, Turkey
| | - Mengnan Shi
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| | - Woonghee Kim
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| | - Han Jin
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| | - Xiangyu Li
- Guangzhou National Laboratory, Guangzhou, Guangdong Province 510005, China
| | - Hasan Turkez
- Department of Medical Biology, Faculty of Medicine, Atatürk University, Erzurum 25240, Turkey
| | - Adnane Achour
- Science for Life Laboratory, Department of Medicine, Solna, Karolinska Institute, 17176 Stockholm, Sweden
| | - Mathias Uhlén
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| | - Adil Mardinoglu
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London SE1 9RT, United Kingdom
| | - Cheng Zhang
- Department of Protein Science, Science for Life Laboratory, KTH-Royal Institute of Technology, 17121 Stockholm, Sweden
| |
Collapse
|
9
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
10
|
Procopio A, Cesarelli G, Donisi L, Merola A, Amato F, Cosentino C. Combined mechanistic modeling and machine-learning approaches in systems biology - A systematic literature review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107681. [PMID: 37385142 DOI: 10.1016/j.cmpb.2023.107681] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/14/2023] [Accepted: 06/14/2023] [Indexed: 07/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Mechanistic-based Model simulations (MM) are an effective approach commonly employed, for research and learning purposes, to better investigate and understand the inherent behavior of biological systems. Recent advancements in modern technologies and the large availability of omics data allowed the application of Machine Learning (ML) techniques to different research fields, including systems biology. However, the availability of information regarding the analyzed biological context, sufficient experimental data, as well as the degree of computational complexity, represent some of the issues that both MMs and ML techniques could present individually. For this reason, recently, several studies suggest overcoming or significantly reducing these drawbacks by combining the above-mentioned two methods. In the wake of the growing interest in this hybrid analysis approach, with the present review, we want to systematically investigate the studies available in the scientific literature in which both MMs and ML have been combined to explain biological processes at genomics, proteomics, and metabolomics levels, or the behavior of entire cellular populations. METHODS Elsevier Scopus®, Clarivate Web of Science™ and National Library of Medicine PubMed® databases were enquired using the queries reported in Table 1, resulting in 350 scientific articles. RESULTS Only 14 of the 350 documents returned by the comprehensive search conducted on the three major online databases met our search criteria, i.e. present a hybrid approach consisting of the synergistic combination of MMs and ML to treat a particular aspect of systems biology. CONCLUSIONS Despite the recent interest in this methodology, from a careful analysis of the selected papers, it emerged how examples of integration between MMs and ML are already present in systems biology, highlighting the great potential of this hybrid approach to both at micro and macro biological scales.
Collapse
Affiliation(s)
- Anna Procopio
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia
| | - Giuseppe Cesarelli
- Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, 80125, Italy
| | - Leandro Donisi
- Department of Advanced Medical and Surgical Sciences, Università della Campania Luigi Vanvitelli, Napoli, 80138, Italy
| | - Alessio Merola
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia
| | - Francesco Amato
- Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, 80125, Italy.
| | - Carlo Cosentino
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia.
| |
Collapse
|
11
|
Li L, Xia R, Chen W, Zhao Q, Tao P, Chen L. Single-cell causal network inferred by cross-mapping entropy. Brief Bioinform 2023; 24:bbad281. [PMID: 37544659 DOI: 10.1093/bib/bbad281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 07/03/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Gene regulatory networks (GRNs) reveal the complex molecular interactions that govern cell state. However, it is challenging for identifying causal relations among genes due to noisy data and molecular nonlinearity. Here, we propose a novel causal criterion, neighbor cross-mapping entropy (NME), for inferring GRNs from both steady data and time-series data. NME is designed to quantify 'continuous causality' or functional dependency from one variable to another based on their function continuity with varying neighbor sizes. NME shows superior performance on benchmark datasets, comparing with existing methods. By applying to scRNA-seq datasets, NME not only reliably inferred GRNs for cell types but also identified cell states. Based on the inferred GRNs and further their activity matrices, NME showed better performance in single-cell clustering and downstream analyses. In summary, based on continuous causality, NME provides a powerful tool in inferring causal regulations of GRNs between genes from scRNA-seq data, which is further exploited to identify novel cell types/states and predict cell type-specific network modules.
Collapse
Affiliation(s)
- Lin Li
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Rui Xia
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Zhao
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Tao
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
12
|
Groves SM, Quaranta V. Quantifying cancer cell plasticity with gene regulatory networks and single-cell dynamics. FRONTIERS IN NETWORK PHYSIOLOGY 2023; 3:1225736. [PMID: 37731743 PMCID: PMC10507267 DOI: 10.3389/fnetp.2023.1225736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/25/2023] [Indexed: 09/22/2023]
Abstract
Phenotypic plasticity of cancer cells can lead to complex cell state dynamics during tumor progression and acquired resistance. Highly plastic stem-like states may be inherently drug-resistant. Moreover, cell state dynamics in response to therapy allow a tumor to evade treatment. In both scenarios, quantifying plasticity is essential for identifying high-plasticity states or elucidating transition paths between states. Currently, methods to quantify plasticity tend to focus on 1) quantification of quasi-potential based on the underlying gene regulatory network dynamics of the system; or 2) inference of cell potency based on trajectory inference or lineage tracing in single-cell dynamics. Here, we explore both of these approaches and associated computational tools. We then discuss implications of each approach to plasticity metrics, and relevance to cancer treatment strategies.
Collapse
Affiliation(s)
- Sarah M. Groves
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States
| | - Vito Quaranta
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States
- Department of Biochemistry, Vanderbilt University, Nashville, TN, United States
| |
Collapse
|
13
|
Jiang Z, Chen C, Xu Z, Wang X, Zhang M, Zhang D. SIGNET: Transcriptome-wide Causal Inference for Gene Regulatory Networks. RESEARCH SQUARE 2023:rs.3.rs-3180043. [PMID: 37546848 PMCID: PMC10402199 DOI: 10.21203/rs.3.rs-3180043/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available (https://www.zstats.org/signet/).
Collapse
Affiliation(s)
- Zhongli Jiang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
| | - Chen Chen
- UCB Pharma, Brussels, 1070, Belgium
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Zhenyu Xu
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Xiaojian Wang
- ByteDance, Shanghai, 201107, China
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Min Zhang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
- Department of Epidemiology and Biostatistics, University of California, Irvine, 92617, California, United States
| | - Dabao Zhang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
| |
Collapse
|
14
|
Merchant JP, Zhu K, Henrion MYR, Zaidi SSA, Lau B, Moein S, Alamprese ML, Pearse RV, Bennett DA, Ertekin-Taner N, Young-Pearse TL, Chang R. Predictive network analysis identifies JMJD6 and other potential key drivers in Alzheimer's disease. Commun Biol 2023; 6:503. [PMID: 37188718 PMCID: PMC10185548 DOI: 10.1038/s42003-023-04791-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 03/31/2023] [Indexed: 05/17/2023] Open
Abstract
Despite decades of genetic studies on late-onset Alzheimer's disease, the underlying molecular mechanisms remain unclear. To better comprehend its complex etiology, we use an integrative approach to build robust predictive (causal) network models using two large human multi-omics datasets. We delineate bulk-tissue gene expression into single cell-type gene expression and integrate clinical and pathologic traits, single nucleotide variation, and deconvoluted gene expression for the construction of cell type-specific predictive network models. Here, we focus on neuron-specific network models and prioritize 19 predicted key drivers modulating Alzheimer's pathology, which we then validate by knockdown in human induced pluripotent stem cell-derived neurons. We find that neuronal knockdown of 10 of the 19 targets significantly modulates levels of amyloid-beta and/or phosphorylated tau peptides, most notably JMJD6. We also confirm our network structure by RNA sequencing in the neurons following knockdown of each of the 10 targets, which additionally predicts that they are upstream regulators of REST and VGF. Our work thus identifies robust neuronal key drivers of the Alzheimer's-associated network state which may represent therapeutic targets with relevance to both amyloid and tau pathology in Alzheimer's disease.
Collapse
Affiliation(s)
- Julie P Merchant
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Neuroscience Graduate Group, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kuixi Zhu
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA
| | - Marc Y R Henrion
- Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, Pembroke Place, L3 5QA, UK
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, PO Box 30096, Blantyre, Malawi
| | - Syed S A Zaidi
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA
| | - Branden Lau
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA
- Arizona Research Labs, Genetics Core, University of Arizona, Tucson, AZ, USA
| | - Sara Moein
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA
| | - Melissa L Alamprese
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA
| | - Richard V Pearse
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Nilüfer Ertekin-Taner
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, USA
- Department of Neurology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - Tracy L Young-Pearse
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Harvard Stem Cell Institute, Harvard University, Boston, MA, USA.
| | - Rui Chang
- The Center for Innovation in Brain Sciences, University of Arizona, Tucson, AZ, USA.
- Department of Neurology, University of Arizona, Tucson, AZ, USA.
- INTelico Therapeutics LLC, Tucson, AZ, USA.
- PATH Biotech LLC, Tucson, AZ, USA.
| |
Collapse
|
15
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:cells12010101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
- Correspondence: ; Tel.: +1-304-293-6455
| |
Collapse
|
16
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
17
|
Yang B, Bao W, Chen B. PGRNIG: novel parallel gene regulatory network identification algorithm based on GPU. Brief Funct Genomics 2022; 21:441-454. [PMID: 36064791 DOI: 10.1093/bfgp/elac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 12/14/2022] Open
Abstract
Molecular biology has revealed that complex life phenomena can be treated as the result of many gene interactions. Investigating these interactions and understanding the intrinsic mechanisms of biological systems using gene expression data have attracted a lot of attention. As a typical gene regulatory network (GRN) inference method, the S-system has been utilized to deal with small-scale network identification. However, it is extremely difficult to optimize it to infer medium-to-large networks. This paper proposes a novel parallel swarm intelligent algorithm, PGRNIG, to optimize the parameters of the S-system. We employed the clone selection strategy to improve the whale optimization algorithm (CWOA). To enhance the time efficiency of CWOA optimization, we utilized a parallel CWOA (PCWOA) based on the compute unified device architecture (CUDA) platform. Decomposition strategy and L1 regularization were utilized to reduce the search space and complexity of GRN inference. We applied the PGRNIG algorithm on three synthetic datasets and two real time-series expression datasets of the species of Escherichia coli and Saccharomyces cerevisiae. Experimental results show that PGRNIG could infer the gene regulatory network more accurately than other state-of-the-art methods with a convincing computational speed-up. Our findings show that CWOA and PCWOA have faster convergence performances than WOA.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou 221000, China
| |
Collapse
|
18
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
19
|
Imani M, Ghoreishi SF. Graph-Based Bayesian Optimization for Large-Scale Objective-Based Experimental Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5913-5925. [PMID: 33877989 DOI: 10.1109/tnnls.2021.3071958] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Design is an inseparable part of most scientific and engineering tasks, including real and simulation-based experimental design processes and parameter/hyperparameter tuning/optimization. Several model-based experimental design techniques have been developed for design in domains with partial available knowledge about the underlying process. This article focuses on a powerful class of model-based experimental design called the mean objective cost of uncertainty (MOCU). The MOCU-based techniques are objective-based, meaning that they take the main objective of the process into account during the experimental design process. However, the lack of scalability of MOCU-based techniques prevents their application to most practical problems, including large discrete or combinatorial spaces. To achieve a scalable objective-based experimental design, this article proposes a graph-based MOCU-based Bayesian optimization framework. The correlations among samples in the large design space are accounted for using a graph-based Gaussian process, and an efficient closed-form sequential selection is achieved through the well-known expected improvement policy. The proposed framework's performance is assessed through the structural intervention in gene regulatory networks, aiming to make the network away from the states associated with cancer.
Collapse
|
20
|
Quantifying biochemical reaction rates from static population variability within incompletely observed complex networks. PLoS Comput Biol 2022; 18:e1010183. [PMID: 35731728 PMCID: PMC9216546 DOI: 10.1371/journal.pcbi.1010183] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 05/07/2022] [Indexed: 11/19/2022] Open
Abstract
Quantifying biochemical reaction rates within complex cellular processes remains a key challenge of systems biology even as high-throughput single-cell data have become available to characterize snapshots of population variability. That is because complex systems with stochastic and non-linear interactions are difficult to analyze when not all components can be observed simultaneously and systems cannot be followed over time. Instead of using descriptive statistical models, we show that incompletely specified mechanistic models can be used to translate qualitative knowledge of interactions into reaction rate functions from covariability data between pairs of components. This promises to turn a globally intractable problem into a sequence of solvable inference problems to quantify complex interaction networks from incomplete snapshots of their stochastic fluctuations.
Collapse
|
21
|
Cadiz MP, Jensen TD, Sens JP, Zhu K, Song WM, Zhang B, Ebbert M, Chang R, Fryer JD. Culture shock: microglial heterogeneity, activation, and disrupted single-cell microglial networks in vitro. Mol Neurodegener 2022; 17:26. [PMID: 35346293 PMCID: PMC8962153 DOI: 10.1186/s13024-022-00531-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microglia, the resident immune cells of the brain, play a critical role in numerous diseases, but are a minority cell type and difficult to genetically manipulate in vivo with viral vectors and other approaches. Primary cultures allow a more controlled setting to investigate these cells, but morphological and transcriptional changes upon removal from their normal brain environment raise many caveats from in vitro studies. METHODS To investigate whether cultured microglia recapitulate in vivo microglial signatures, we used single-cell RNA sequencing (scRNAseq) to compare microglia freshly isolated from the brain to primary microglial cultures. We performed cell population discovery, differential expression analysis, and gene co-expression module analysis to compare signatures between in vitro and in vivo microglia. We constructed causal predictive network models of transcriptional regulators from the scRNAseq data and identified a set of potential key drivers of the cultured phenotype. To validate this network analysis, we knocked down two of these key drivers, C1qc and Prdx1, in primary cultured microglia and quantified changes in microglial activation markers. RESULTS We found that, although often assumed to be a relatively homogenous population of cells in culture, in vitro microglia are a highly heterogeneous population consisting of distinct subpopulations of cells with transcriptional profiles reminiscent of macrophages and monocytes, and are marked by transcriptional programs active in neurodegeneration and other disease states. We found that microglia in vitro presented transcriptional activation of a set of "culture shock genes" not found in freshly isolated microglia, characterized by strong upregulation of disease-associated genes including Apoe, Lyz2, and Spp1, and downregulation of homeostatic microglial markers, including Cx3cr1, P2ry12, and Tmem119. Finally, we found that cultured microglia prominently alter their transcriptional machinery modulated by key drivers from the homeostatic to activated phenotype. Knockdown of one of these drivers, C1qc, resulted in downregulation of microglial activation genes Lpl, Lyz2, and Ccl4. CONCLUSIONS Overall, our data suggest that when removed from their in vivo home environment, microglia suffer a severe case of "culture shock", drastically modulating their transcriptional regulatory network state from homeostatic to activated through upregulation of modules of culture-specific genes. Consequently, cultured microglia behave as a disparate cell type that does not recapitulate the homeostatic signatures of microglia in vivo. Finally, our predictive network model discovered potential key drivers that may convert activated microglia back to their homeostatic state, allowing for more accurate representation of in vivo states in culture. Knockdown of key driver C1qc partially attenuated microglial activation in vitro, despite C1qc being only weakly upregulated in culture. This suggests that even genes that are not strongly differentially expressed across treatments or preparations may drive downstream transcriptional changes in culture.
Collapse
Affiliation(s)
- Mika P. Cadiz
- Department of Neuroscience, Mayo Clinic, Scottsdale, AZ 85259 USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Scottsdale, AZ 85259 USA
| | - Tanner D. Jensen
- Department of Neuroscience, Mayo Clinic, Scottsdale, AZ 85259 USA
| | - Jonathon P. Sens
- Department of Neuroscience, Mayo Clinic, Scottsdale, AZ 85259 USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Scottsdale, AZ 85259 USA
| | - Kuixi Zhu
- Department of Neurology, University of Arizona, Tucson, AZ 85721 USA
| | - Won-Min Song
- Department of Genetics & Genomic Sciences, Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics & Genomic Sciences, Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Mark Ebbert
- Sanders-Brown Center on Aging, Biomedical Informatics, and Department of Neuroscience, University of Kentucky, Lexington, KY 40536 USA
| | - Rui Chang
- Department of Neurology, University of Arizona, Tucson, AZ 85721 USA
| | - John D. Fryer
- Department of Neuroscience, Mayo Clinic, Scottsdale, AZ 85259 USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Scottsdale, AZ 85259 USA
| |
Collapse
|
22
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
23
|
A decomposition structure learning algorithm in Bayesian network based on a two-stage combination method. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00623-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractDecomposition hybrid algorithms with the recursive framework which recursively decompose the structural task into structural subtasks to reduce computational complexity are employed to learn Bayesian network (BN) structure. Merging rules are commonly adopted as the combination method in the combination step. The direction determination rule of merging rules has problems in using the idea of keeping v-structures unchanged before and after combination to determine directions of edges in the whole structure. It breaks down in one case due to appearances of wrong v-structures, and is hard to operate in practice. Therefore, we adopt a novel approach for direction determination and propose a two-stage combination method. In the first-stage combination method, we determine nodes, links of edges by merging rules and adopt the idea of permutation and combination to determine directions of contradictory edges. In the second-stage combination method, we restrict edges between nodes that do not satisfy the decomposition property and their parent nodes by determining the target domain according to the decomposition property. Simulation experiments on four networks show that the proposed algorithm can obtain BN structure with higher accuracy compared with other algorithms. Finally, the proposed algorithm is applied to the thickening process of gold hydrometallurgy to solve the practical problem.
Collapse
|
24
|
Nazari E, Biviji R, Roshandel D, Pour R, Shahriari MH, Mehrabian A, Tabesh H. Decision fusion in healthcare and medicine: a narrative review. Mhealth 2022; 8:8. [PMID: 35178439 PMCID: PMC8800206 DOI: 10.21037/mhealth-21-15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/02/2021] [Indexed: 11/06/2022] Open
Abstract
OBJECTIVE To provide an overview of the decision fusion (DF) technique and describe the applications of the technique in healthcare and medicine at prevention, diagnosis, treatment and administrative levels. BACKGROUND The rapid development of technology over the past 20 years has led to an explosion in data growth in various industries, like healthcare. Big data analysis within the healthcare systems is essential for arriving to a value-based decision over a period of time. Diversity and uncertainty in big data analytics have made it impossible to analyze data by using conventional data mining techniques and thus alternative solutions are required. DF is a form of data fusion techniques that could increase the accuracy of diagnosis and facilitate interpretation, summarization and sharing of information. METHODS We conducted a review of articles published between January 1980 and December 2020 from various databases such as Google Scholar, IEEE, PubMed, Science Direct, Scopus and web of science using the keywords decision fusion (DF), information fusion, healthcare, medicine and big data. A total of 141 articles were included in this narrative review. CONCLUSIONS Given the importance of big data analysis in reducing costs and improving the quality of healthcare; along with the potential role of DF in big data analysis, it is recommended to know the full potential of this technique including the advantages, challenges and applications of the technique before its use. Future studies should focus on describing the methodology and types of data used for its applications within the healthcare sector.
Collapse
Affiliation(s)
- Elham Nazari
- Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Rizwana Biviji
- Science of Healthcare Delivery, College of Health Solutions, Arizona State University, Phoenix, AZ, USA
| | - Danial Roshandel
- Centre for Ophthalmology and Visual Science (affiliated with the Lions Eye Institute), The University of Western Australia, Perth, Western Australia, Australia
| | - Reza Pour
- Department of Computer Engineering, Azad University, Mashhad, Iran
| | - Mohammad Hasan Shahriari
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Mehrabian
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Hamed Tabesh
- Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
25
|
Han J, Perera S, Wunderlich Z, Periwal V. Mechanistic gene networks inferred from single-cell data with an outlier-insensitive method. Math Biosci 2021; 342:108722. [PMID: 34688607 PMCID: PMC8722367 DOI: 10.1016/j.mbs.2021.108722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 08/25/2021] [Accepted: 08/25/2021] [Indexed: 11/28/2022]
Abstract
With advances in single-cell techniques, measuring gene dynamics at cellular resolution has become practicable. In contrast, the increased complexity of data has made it more challenging computationally to unravel underlying biological mechanisms. Thus, it is critical to develop novel computational methods capable of dealing with such complexity and of providing predictive deductions from such data. Many methods have been developed to address such challenges, each with its own advantages and limitations. We present an iterative regression algorithm for inferring a mechanistic gene network from single-cell data, especially suited to overcoming problems posed by measurement outliers. Using this regression, we infer a developmental model for the gene dynamics in Drosophila melanogaster blastoderm embryo. Our results show that the predictive power of the inferred model is higher than that of other models inferred with least squares and ridge regressions. As a baseline for how well a mechanistic model should be expected to perform, we find that model predictions of the gene dynamics are more accurate than predictions made with neural networks of varying architectures and complexity. This holds true even in the limit of small sample sizes. We compare predictions for various gene knockouts with published experimental results, finding substantial qualitative agreement. We also make predictions for gene dynamics under various gene network perturbations, impossible in non-mechanistic models.
Collapse
Affiliation(s)
- Jungmin Han
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Sudheesha Perera
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92617, United States of America.
| | - Vipul Periwal
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| |
Collapse
|
26
|
Guo WF, Zhang SW, Zeng T, Akutsu T, Chen L. Network control principles for identifying personalized driver genes in cancer. Brief Bioinform 2021; 21:1641-1662. [PMID: 31711128 DOI: 10.1093/bib/bbz089] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 06/26/2019] [Accepted: 06/27/2019] [Indexed: 02/02/2023] Open
Abstract
To understand tumor heterogeneity in cancer, personalized driver genes (PDGs) need to be identified for unraveling the genotype-phenotype associations corresponding to particular patients. However, most of the existing driver-focus methods mainly pay attention on the cohort information rather than on individual information. Recent developing computational approaches based on network control principles are opening a new way to discover driver genes in cancer, particularly at an individual level. To provide comprehensive perspectives of network control methods on this timely topic, we first considered the cancer progression as a network control problem, in which the expected PDGs are altered genes by oncogene activation signals that can change the individual molecular network from one health state to the other disease state. Then, we reviewed the network reconstruction methods on single samples and introduced novel network control methods on single-sample networks to identify PDGs in cancer. Particularly, we gave a performance assessment of the network structure control-based PDGs identification methods on multiple cancer datasets from TCGA, for which the data and evaluation package also are publicly available. Finally, we discussed future directions for the application of network control methods to identify PDGs in cancer and diverse biological processes.
Collapse
Affiliation(s)
- Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 611-0011, Japan
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China.,School of Life Science and Technology, ShanghaiTech University, 201210 Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai 201210, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
27
|
Cingiz MÖ, Biricik G, Diri B. The Performance Comparison of Gene Co-expression Networks of Breast and Prostate Cancer using Different Selection Criteria. Interdiscip Sci 2021; 13:500-510. [PMID: 34003445 DOI: 10.1007/s12539-021-00440-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 06/12/2023]
Abstract
Gene co-expression networks (GCN) present undirected relations between genes to understand molecular structures behind the diseases, including cancer. The utilization of various biological datasets and gene network inference (GNI) algorithms can reveal meaningful gene-gene interactions of GCNs. This study applies three GNI algorithms on mRNA gene expression, RNA-Seq, and miRNA-target genes datasets to infer GCNs of breast and prostate cancers. To evaluate the performance of the GCNs, we utilize overlap analysis via literature data, topological assessment, and Gene Ontology-based biological assessment. The results emphasize how the selection of biological datasets and GNI algorithms affect the performance results on different evaluation criteria. GCNs on microarray gene expression data slightly outperform in overlap analysis. Also, GCNs on RNA-Seq and gene expression datasets follow scale-free topology. The biological assessment results are close to each other on all biological datasets. C3NET algorithm-based GCNs did not contain any biological assessment modules; therefore, it is not optimal for biological assessment. GNI algorithms' selection did not change the overlap analysis and topological assessment results. Our primary objective is to compare the performance results of biological datasets and GNI algorithms based on different evaluation criteria. For this purpose, we developed the GNIAP R package that enables users to select different GNI algorithms to infer GCNs. The GNIAP R package also provides literature-based overlap analysis, and topological and biological analyses on GCNs. Users can access the GNIAP R package via https://github.com/ozgurcingiz/GNIAP .
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bursa Technical University, 16310, Yildirim, Bursa, Turkey.
| | - Göksel Biricik
- Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey
| | - Banu Diri
- Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey
| |
Collapse
|
28
|
Trinh HC, Kwon YK. A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data. Bioinformatics 2021; 37:i383-i391. [PMID: 34252959 PMCID: PMC8275338 DOI: 10.1093/bioinformatics/btab295] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION It is a challenging problem in systems biology to infer both the network structure and dynamics of a gene regulatory network from steady-state gene expression data. Some methods based on Boolean or differential equation models have been proposed but they were not efficient in inference of large-scale networks. Therefore, it is necessary to develop a method to infer the network structure and dynamics accurately on large-scale networks using steady-state expression. RESULTS In this study, we propose a novel constrained genetic algorithm-based Boolean network inference (CGA-BNI) method where a Boolean canalyzing update rule scheme was employed to capture coarse-grained dynamics. Given steady-state gene expression data as an input, CGA-BNI identifies a set of path consistency-based constraints by comparing the gene expression level between the wild-type and the mutant experiments. It then searches Boolean networks which satisfy the constraints and induce attractors most similar to steady-state expressions. We devised a heuristic mutation operation for faster convergence and implemented a parallel evaluation routine for execution time reduction. Through extensive simulations on the artificial and the real gene expression datasets, CGA-BNI showed better performance than four other existing methods in terms of both structural and dynamics prediction accuracies. Taken together, CGA-BNI is a promising tool to predict both the structure and the dynamics of a gene regulatory network when a highest accuracy is needed at the cost of sacrificing the execution time. AVAILABILITY AND IMPLEMENTATION Source code and data are freely available at https://github.com/csclab/CGA-BNI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hung-Cuong Trinh
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 758307, Vietnam
| | - Yung-Keun Kwon
- Department of IT Convergence, University of Ulsan, Ulsan 680-749, Korea
| |
Collapse
|
29
|
van Wyk R, van Biljon R, Birkholtz LM. MALBoost: a web-based application for gene regulatory network analysis in Plasmodium falciparum. Malar J 2021; 20:317. [PMID: 34261498 PMCID: PMC8278594 DOI: 10.1186/s12936-021-03848-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/07/2021] [Indexed: 11/10/2022] Open
Abstract
Background Gene Regulatory Networks (GRN) produce powerful insights into transcriptional regulation in cells. The power of GRNs has been underutilized in malaria research. The Arboreto library was incorporated into a user-friendly web-based application for malaria researchers (http://malboost.bi.up.ac.za). This application will assist researchers with gaining an in depth understanding of transcriptomic datasets. Methods The web application for MALBoost was built in Python-Flask with Redis and Celery workers for queue submission handling, which execute the Arboreto suite algorithms. A submission of 5–50 regulators and total expression set of 5200 genes is permitted. The program runs in a point-and-click web user interface built using Bootstrap4 templates. Post-analysis submission, users are redirected to a status page with run time estimates and ultimately a download button upon completion. Result updates or failure updates will be emailed to the users. Results A web-based application with an easy-to-use interface is presented with a use case validation of AP2-G and AP2-I. The validation set incorporates cross-referencing with ChIP-seq and transcriptome datasets. For AP2-G, 5 ChIP-seq targets were significantly enriched with seven more targets presenting with strong evidence of validated targets. Conclusion The MALBoost application provides the first tool for easy interfacing and efficiently allows gene regulatory network construction for Plasmodium. Additionally, access is provided to a pre-compiled network for use as reference framework. Validation for sexually committed ring-stage parasite targets of AP2-G, suggests the algorithm was effective in resolving “traditionally” low-level signatures even in bulk RNA datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12936-021-03848-2.
Collapse
Affiliation(s)
- Roelof van Wyk
- Department of Biochemistry, Genetics and Microbiology and the Institute for Sustainable Malaria Control, University of Pretoria, Private Bag X20, Hatfield, Pretoria, 0028, South Africa.
| | - Riëtte van Biljon
- Department of Biochemistry and Molecular Biology and the Huck Centre for Malaria Research, Pennsylvania State University, University Park, PA, 16802, USA
| | - Lyn-Marie Birkholtz
- Department of Biochemistry, Genetics and Microbiology and the Institute for Sustainable Malaria Control, University of Pretoria, Private Bag X20, Hatfield, Pretoria, 0028, South Africa.
| |
Collapse
|
30
|
Liu E, Li J, Kinnebrew GH, Zhang P, Zhang Y, Cheng L, Li L. A Fast and Furious Bayesian Network and Its Application of Identifying Colon Cancer to Liver Metastasis Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1325-1335. [PMID: 31581091 DOI: 10.1109/tcbb.2019.2944826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Bayesian networks is a powerful method for identifying causal relationships among variables. However, as the network size increases, the time complexity of searching the optimal structure grows exponentially. We proposed a novel search algorithm - Fast and Furious Bayesian Network (FFBN). Compared to the existing greedy search algorithm, FFBN uses significantly fewer model configuration rules to determine the causal direction of edges when constructing the Bayesian network, which leads to greatly improved computational speed. We benchmarked the performance of FFBN by reconstructing gene regulatory networks (GRNs) from two DREAM5 challenge datasets: a synthetic dataset and a larger yeast transcriptome dataset. In both datasets, FFBN shows a much faster speed than the existing greedy search algorithm, while maintaining equally good or better performance in recall and precision. We then constructed three whole transcriptome GRNs for primary liver cancer (PL), primary colon cancer (PC) and colon to liver metastasis (CLM) expression data, which the existing greedy search algorithms failed. Three GRNs contain 12,099 common genes. Unprecedentedly, our newly developed FFBN algorithm is able to build up GRNs at a scale larger than 10,000 genes. Using FFBN, we discovered that CLM has its unique cancer molecular mechanisms and shares a certain degree of similarity with both PL and PC.
Collapse
|
31
|
Pyne S, Anand A. Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1608-1619. [PMID: 31613774 DOI: 10.1109/tcbb.2019.2946826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstruction of time-varying gene regulatory networks underlying a time-series gene expression data is a fundamental challenge in the computational systems biology. The challenge increases multi-fold if the target networks need to be constructed for hundreds to thousands of genes. There have been constant efforts to design an algorithm that can perform the reconstruction task correctly as well as can scale efficiently (with respect to both time and memory) to such a large number of genes. However, the existing algorithms either do not offer time-efficiency, or they offer it at other costs - memory-inefficiency or imposition of a constraint, known as the 'smoothly time-varying assumption'. In this article, two novel algorithms - 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators - which is Light on memory' (TGS-Lite) and 'TGS-Lite Plus' (TGS-Lite+) - are proposed that are time-efficient, memory-efficient and do not impose the smoothly time-varying assumption. Additionally, they offer state-of-the-art reconstruction correctness as demonstrated with three benchmark datasets. Source Code: https://github.com/sap01/TGS-Lite-supplem/tree/master/sourcecode.
Collapse
|
32
|
Salimi D, Moeini A. Incorporating K-mers Highly Correlated to Epigenetic Modifications for Bayesian Inference of Gene Interactions. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200728193621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Objective::
A gene interaction network, along with its related biological features, has an
important role in computational biology. Bayesian network, as an efficient model, based on
probabilistic concepts is able to exploit known and novel biological casual relationships between
genes. The success of Bayesian networks in predicting the relationships greatly depends on
selecting priors.
Methods::
K-mers have been applied as the prominent features to uncover the similarity between
genes in a specific pathway, suggesting that this feature can be applied to study genes
dependencies. In this study, we propose k-mers (4,5 and 6-mers) highly correlated with epigenetic
modifications, including 17 modifications, as a new prior for Bayesian inference in the gene
interaction network.
Result::
Employing this model on a network of 23 human genes and on a network based on 27
genes related to yeast resulted in F-measure improvements in different biological networks.
Conclusion::
The improvements in the best case are 12%, 36%, and 10% in the pathway, coexpression,
and physical interaction, respectively.
Collapse
Affiliation(s)
- Dariush Salimi
- Department of Animal Science, Faculty of Agriculture, University of Zanjan, Zanjan, Iran
| | - Ali Moeini
- Department of Algorithms and Computation, Faculty of Engineering Science, College of Engineering, University of Tehran, Tehran, Iran
| |
Collapse
|
33
|
Vatsa D, Agarwal S. PEPN-GRN: A Petri net-based approach for the inference of gene regulatory networks from noisy gene expression data. PLoS One 2021; 16:e0251666. [PMID: 33989333 PMCID: PMC8121333 DOI: 10.1371/journal.pone.0251666] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/30/2021] [Indexed: 11/22/2022] Open
Abstract
The inference of gene regulatory networks (GRNs) from expression data is a challenging problem in systems biology. The stochasticity or fluctuations in the biochemical processes that regulate the transcription process poses as one of the major challenges. In this paper, we propose a novel GRN inference approach, named the Probabilistic Extended Petri Net for Gene Regulatory Network (PEPN-GRN), for the inference of gene regulatory networks from noisy expression data. The proposed inference approach makes use of transition of discrete gene expression levels across adjacent time points as different evidence types that relate to the production or decay of genes. The paper examines three variants of the PEPN-GRN method, which mainly differ by the way the scores of network edges are computed using evidence types. The proposed method is evaluated on the benchmark DREAM4 in silico data sets and a real time series data set of E. coli from the DREAM5 challenge. The PEPN-GRN_v3 variant (the third variant of the PEPN-GRN approach) sought to learn the weights of evidence types in accordance with their contribution to the activation and inhibition gene regulation process. The learned weights help understand the time-shifted and inverted time-shifted relationship between regulator and target gene. Thus, PEPN-GRN_v3, along with the inference of network edges, also provides a functional understanding of the gene regulation process.
Collapse
Affiliation(s)
- Deepika Vatsa
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Sumeet Agarwal
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
- * E-mail: ,
| |
Collapse
|
34
|
He W, Tang J, Zou Q, Guo F. MMFGRN: a multi-source multi-model fusion method for gene regulatory network reconstruction. Brief Bioinform 2021; 22:6261916. [PMID: 33939795 DOI: 10.1093/bib/bbab166] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/08/2021] [Accepted: 04/08/2021] [Indexed: 01/05/2023] Open
Abstract
Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.
Collapse
Affiliation(s)
| | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
35
|
Chanumolu SK, Albahrani M, Can H, Otu HH. KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways. EMBNET.JOURNAL 2021; 26. [PMID: 33880340 PMCID: PMC8055051 DOI: 10.14806/ej.26.0.949] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database provides a manual curation of biological pathways that involve genes (or gene products), metabolites, chemical compounds, maps, and other entries. However, most applications and datasets involved in omics are gene or protein-centric requiring pathway representations that include direct and indirect interactions only between genes. Furthermore, special methodologies, such as Bayesian networks require acyclic representations of graphs. We developed KEGG2Net, a web resource that generates a network involving only the genes represented on a KEGG pathway with all of the direct and indirect gene-gene interactions deduced from the pathway. KEGG2Net offers four different methods to remove cycles from the resulting gene interaction network, converting them into directed acyclic graphs (DAGs). We generated synthetic gene expression data using the gene interaction networks deduced from the KEGG pathways and performed a comparative analysis of different cycle removal methods by testing the fitness of their DAGs to the data and by the number of edges they eliminate. Our results indicate that an ensemble method for cycle removal performs as the best approach to convert the gene interaction networks into DAGs. Resulting gene interaction networks and DAGs are represented in multiple user-friendly formats that can be used in other applications, and as images for quick and easy visualisation. The KEGG2Net web portal converts KEGG maps for any organism into gene-gene interaction networks and corresponding DAGS representing all of the direct and indirect interactions among the genes.
Collapse
Affiliation(s)
- Sree K Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Mustafa Albahrani
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Hasan H Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
36
|
Pirgazi J, Olyaee MH, Khanteymoori A. KFGRNI: A robust method to inference gene regulatory network from time-course gene data based on ensemble Kalman filter. J Bioinform Comput Biol 2021; 19:2150002. [PMID: 33657986 DOI: 10.1142/s0219720021500025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A central problem of systems biology is the reconstruction of Gene Regulatory Networks (GRNs) by the use of time series data. Although many attempts have been made to design an efficient method for GRN inference, providing a best solution is still a challenging task. Existing noise, low number of samples, and high number of nodes are the main reasons causing poor performance of existing methods. The present study applies the ensemble Kalman filter algorithm to model a GRN from gene time series data. The inference of a GRN is decomposed with p genes into p subproblems. In each subproblem, the ensemble Kalman filter algorithm identifies the weight of interactions for each target gene. With the use of the ensemble Kalman filter, the expression pattern of the target gene is predicted from the expression patterns of all the remaining genes. The proposed method is compared with several well-known approaches. The results of the evaluation indicate that the proposed method improves inference accuracy and demonstrates better regulatory relations with noisy data.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran Behshahr, Iran
| | - Mohammad Hossein Olyaee
- Department of Computer Engineering, Engineering Faculty, University of Gonabad, Gonabad, Iran
| | - Alireza Khanteymoori
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany.,Department of Computer Engineering, Engineering Faculty, University of Zanjan Zanjan Province, Iran
| |
Collapse
|
37
|
|
38
|
Becker AK, Dörr M, Felix SB, Frost F, Grabe HJ, Lerch MM, Nauck M, Völker U, Völzke H, Kaderali L. From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach. PLoS Comput Biol 2021; 17:e1008735. [PMID: 33577591 PMCID: PMC7906470 DOI: 10.1371/journal.pcbi.1008735] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 02/25/2021] [Accepted: 01/22/2021] [Indexed: 01/26/2023] Open
Abstract
In this work, we introduce an entirely data-driven and automated approach to reveal disease-associated biomarker and risk factor networks from heterogeneous and high-dimensional healthcare data. Our workflow is based on Bayesian networks, which are a popular tool for analyzing the interplay of biomarkers. Usually, data require extensive manual preprocessing and dimension reduction to allow for effective learning of Bayesian networks. For heterogeneous data, this preprocessing is hard to automatize and typically requires domain-specific prior knowledge. We here combine Bayesian network learning with hierarchical variable clustering in order to detect groups of similar features and learn interactions between them entirely automated. We present an optimization algorithm for the adaptive refinement of such group Bayesian networks to account for a specific target variable, like a disease. The combination of Bayesian networks, clustering, and refinement yields low-dimensional but disease-specific interaction networks. These networks provide easily interpretable, yet accurate models of biomarker interdependencies. We test our method extensively on simulated data, as well as on data from the Study of Health in Pomerania (SHIP-TREND), and demonstrate its effectiveness using non-alcoholic fatty liver disease and hypertension as examples. We show that the group network models outperform available biomarker scores, while at the same time, they provide an easily interpretable interaction network. High-dimensional and heterogeneous healthcare data, such as electronic health records or epidemiological study data, contain much information on yet unknown risk factors that are associated with disease development. The identification of these risk factors may help to improve prevention, diagnosis, and therapy. Bayesian networks are powerful statistical models that can decipher these complex relationships. However, high dimensionality and heterogeneity of data, together with missing values and high feature correlation, make it difficult to automatically learn a good model from data. To facilitate the use of network models, we present a novel, fully automated workflow that combines network learning with hierarchical clustering. The algorithm reveals groups of strongly related features and models the interactions among those groups. It results in simpler network models that are easier to analyze. We introduce a method of adaptive refinement of such models to ensure that disease-relevant parts of the network are modeled in great detail. Our approach makes it easy to learn compact, accurate, and easily interpretable biomarker interaction networks. We test our method extensively on simulated data as well as data from the Study of Health in Pomerania (SHIP-Trend) by learning models of hypertension and non-alcoholic fatty liver disease.
Collapse
Affiliation(s)
- Ann-Kristin Becker
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | - Marcus Dörr
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
| | - Stephan B. Felix
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
| | - Fabian Frost
- Department of Internal Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Hans J. Grabe
- Department of Psychiatry, University Medicine Greifswald, Greifswald, Germany
| | - Markus M. Lerch
- Department of Internal Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Matthias Nauck
- Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Uwe Völker
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Henry Völzke
- Institute of Community Medicine, SHIP/KEF, University Medicine Greifswald, Greifswald, Germany
| | - Lars Kaderali
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
- * E-mail:
| |
Collapse
|
39
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
40
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
41
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
42
|
Franzosa JA, Bonzo JA, Jack J, Baker NC, Kothiya P, Witek RP, Hurban P, Siferd S, Hester S, Shah I, Ferguson SS, Houck KA, Wambaugh JF. High-throughput toxicogenomic screening of chemicals in the environment using metabolically competent hepatic cell cultures. NPJ Syst Biol Appl 2021; 7:7. [PMID: 33504769 PMCID: PMC7840683 DOI: 10.1038/s41540-020-00166-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 10/15/2020] [Indexed: 01/30/2023] Open
Abstract
The ToxCast in vitro screening program has provided concentration-response bioactivity data across more than a thousand assay endpoints for thousands of chemicals found in our environment and commerce. However, most ToxCast screening assays have evaluated individual biological targets in cancer cell lines lacking integrated physiological functionality (such as receptor signaling, metabolism). We evaluated differentiated HepaRGTM cells, a human liver-derived cell model understood to effectively model physiologically relevant hepatic signaling. Expression of 93 gene transcripts was measured by quantitative polymerase chain reaction using Fluidigm 96.96 dynamic arrays in response to 1060 chemicals tested in eight-point concentration-response. A Bayesian framework quantitatively modeled chemical-induced changes in gene expression via six transcription factors including: aryl hydrocarbon receptor, constitutive androstane receptor, pregnane X receptor, farnesoid X receptor, androgen receptor, and peroxisome proliferator-activated receptor alpha. For these chemicals the network model translates transcriptomic data into Bayesian inferences about molecular targets known to activate toxicological adverse outcome pathways. These data also provide new insights into the molecular signaling network of HepaRGTM cell cultures.
Collapse
Affiliation(s)
- Jill A Franzosa
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | - Jessica A Bonzo
- Cell Biology, Biosciences Division, Thermo Fisher Scientific, Frederick, MD, 21703, USA
| | - John Jack
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | | | - Parth Kothiya
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | - Rafal P Witek
- Cell Biology, Biosciences Division, Thermo Fisher Scientific, Frederick, MD, 21703, USA
| | | | | | - Susan Hester
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | - Stephen S Ferguson
- Division of National Toxicology Program, National Institutes of Environmental Health Sciences of National Institutes of Health, Durham, NC, 27709, USA
| | - Keith A Houck
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA
| | - John F Wambaugh
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, Research Triangle Park, NC, 27711, USA.
| |
Collapse
|
43
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
44
|
Fundamental Boolean network modelling for childhood acute lymphoblastic leukaemia pathways. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
45
|
Carcamo-Orive I, Henrion MYR, Zhu K, Beckmann ND, Cundiff P, Moein S, Zhang Z, Alamprese M, D’Souza SL, Wabitsch M, Schadt EE, Quertermous T, Knowles JW, Chang R. Predictive network modeling in human induced pluripotent stem cells identifies key driver genes for insulin responsiveness. PLoS Comput Biol 2020; 16:e1008491. [PMID: 33362275 PMCID: PMC7790417 DOI: 10.1371/journal.pcbi.1008491] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/07/2021] [Accepted: 11/03/2020] [Indexed: 12/16/2022] Open
Abstract
Insulin resistance (IR) precedes the development of type 2 diabetes (T2D) and increases cardiovascular disease risk. Although genome wide association studies (GWAS) have uncovered new loci associated with T2D, their contribution to explain the mechanisms leading to decreased insulin sensitivity has been very limited. Thus, new approaches are necessary to explore the genetic architecture of insulin resistance. To that end, we generated an iPSC library across the spectrum of insulin sensitivity in humans. RNA-seq based analysis of 310 induced pluripotent stem cell (iPSC) clones derived from 100 individuals allowed us to identify differentially expressed genes between insulin resistant and sensitive iPSC lines. Analysis of the co-expression architecture uncovered several insulin sensitivity-relevant gene sub-networks, and predictive network modeling identified a set of key driver genes that regulate these co-expression modules. Functional validation in human adipocytes and skeletal muscle cells (SKMCs) confirmed the relevance of the key driver candidate genes for insulin responsiveness. Insulin resistance is characterized by a defective response (“resistance”) to normal insulin concentrations to uptake the glucose present in the blood, and is the underlying condition that leads to type 2 diabetes (T2D) and increases the risk of cardiovascular disease. It is estimated that 25–33% of the US population are insulin resistant enough to be at risk of serious clinical consequences. For more than a decade, large population studies have tried to discover the genes that participate in the development of insulin resistance, but without much success. It is now increasingly clear that the complex genetic nature of insulin resistance requires novel approaches centered in patient specific cellular models. To fill this gap, we have generated an induced pluripotent stem cell (iPSC) library from individuals with accurate measurements of insulin sensitivity, and performed gene expression and key driver analyses. Our work demonstrates that iPSCs can be used as a revolutionary technology to model insulin resistance and to discover key genetic drivers. Moreover, they can develop our basic knowledge of the disease, and are ultimately expected to increase the therapeutic targets to treat insulin resistance and type 2 diabetes.
Collapse
Affiliation(s)
- Ivan Carcamo-Orive
- Stanford University School of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, and Diabetes Research Center, Stanford, California, United States of America
- * E-mail: (ICO); (JWK); (RC)
| | - Marc Y. R. Henrion
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, United Kingdom
- Malawi—Liverpool—Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Kuixi Zhu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Neurology, University of Arizona, Tucson, Arizona, United States of America
- The Center for Innovations in Brain Sciences, University of Arizona, Tucson, Arizona, United States of America
| | - Noam D. Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Paige Cundiff
- Vertex Pharmaceuticals, Boston, Massachusetts, United States of America
| | - Sara Moein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Neurology, University of Arizona, Tucson, Arizona, United States of America
- The Center for Innovations in Brain Sciences, University of Arizona, Tucson, Arizona, United States of America
| | - Zenan Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Melissa Alamprese
- Department of Neurology, University of Arizona, Tucson, Arizona, United States of America
- The Center for Innovations in Brain Sciences, University of Arizona, Tucson, Arizona, United States of America
| | - Sunita L. D’Souza
- Department of Cellular, Developmental and Regenerative Biology, Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Martin Wabitsch
- Department of Pediatrics and Adolescent Medicine, Division of Pediatric Endocrinology, Ulm University, Ulm, Germany
| | - Eric E. Schadt
- Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Thomas Quertermous
- Stanford University School of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, and Diabetes Research Center, Stanford, California, United States of America
| | - Joshua W. Knowles
- Stanford University School of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute, and Diabetes Research Center, Stanford, California, United States of America
- * E-mail: (ICO); (JWK); (RC)
| | - Rui Chang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Neurology, University of Arizona, Tucson, Arizona, United States of America
- The Center for Innovations in Brain Sciences, University of Arizona, Tucson, Arizona, United States of America
- INTelico Therapeutics LLC, Tucson, Arizona, United States of America
- * E-mail: (ICO); (JWK); (RC)
| |
Collapse
|
46
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
47
|
Shi M, Tan S, Xie XP, Li A, Yang W, Zhu T, Wang HQ. Globally learning gene regulatory networks based on hidden atomic regulators from transcriptomic big data. BMC Genomics 2020; 21:711. [PMID: 33054712 PMCID: PMC7559338 DOI: 10.1186/s12864-020-07079-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 09/18/2020] [Indexed: 12/02/2022] Open
Abstract
Background Genes are regulated by various types of regulators and most of them are still unknown or unobserved. Current gene regulatory networks (GRNs) reverse engineering methods often neglect the unknown regulators and infer regulatory relationships in a local and sub-optimal manner. Results This paper proposes a global GRNs inference framework based on dictionary learning, named dlGRN. The method intends to learn atomic regulators (ARs) from gene expression data using a modified dictionary learning (DL) algorithm, which reflects the whole gene regulatory system, and predicts the regulation between a known regulator and a target gene in a global regression way. The modified DL algorithm fits the scale-free property of biological network, rendering dlGRN intrinsically discern direct and indirect regulations. Conclusions Extensive experimental results on simulation and real-world data demonstrate the effectiveness and efficiency of dlGRN in reverse engineering GRNs. A novel predicted transcription regulation between a TF TFAP2C and an oncogene EGFR was experimentally verified in lung cancer cells. Furthermore, the real application reveals the prevalence of DNA methylation regulation in gene regulatory system. dlGRN can be a standalone tool for GRN inference for its globalization and robustness.
Collapse
Affiliation(s)
- Ming Shi
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.,Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Sheng Tan
- The CAS Key Laboratory of Innate Immunity and Chronic Disease, Division of Life Sciences and Medicine, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Xin-Ping Xie
- School of Mathematics and Physics, Anhui Jianzhu University, 856 Jinzhai Road, Hefei, Anhui, 230022, P. R. China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Wulin Yang
- Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China
| | - Tao Zhu
- Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Hong-Qiang Wang
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China. .,Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.
| |
Collapse
|
48
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
49
|
Gene regulatory network inference from sparsely sampled noisy data. Nat Commun 2020; 11:3493. [PMID: 32661225 PMCID: PMC7359369 DOI: 10.1038/s41467-020-17217-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 06/11/2020] [Indexed: 12/16/2022] Open
Abstract
The complexity of biological systems is encoded in gene regulatory networks. Unravelling this intricate web is a fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. The major obstacle in inferring gene regulatory networks is the lack of data. While time series data are nowadays widely available, they are typically noisy, with low sampling frequency and overall small number of samples. This paper develops a method called BINGO to specifically deal with these issues. Benchmarked with both real and simulated time-series data covering many different gene regulatory networks, BINGO clearly and consistently outperforms state-of-the-art methods. The novelty of BINGO lies in a nonparametric approach featuring statistical sampling of continuous gene expression profiles. BINGO's superior performance and ease of use, even by non-specialists, make gene regulatory network inference available to any researcher, helping to decipher the complex mechanisms of life.
Collapse
|
50
|
Song J, Tian S, Yu L, Xing Y, Yang Q, Duan X, Dai Q. AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA. Interdiscip Sci 2020; 12:414-423. [PMID: 32572768 DOI: 10.1007/s12539-020-00379-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 05/18/2020] [Accepted: 05/30/2020] [Indexed: 01/03/2023]
Abstract
Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .
Collapse
Affiliation(s)
- Jinmiao Song
- School of Information Science and Engineering, Xinjiang University, Urumqi, 830008, China
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Shengwei Tian
- School of Software, Xinjiang University, Urumqi, 830046, China.
| | - Long Yu
- Network Center, Xinjiang University, Urumqi, 830046, China
| | - Yan Xing
- Imaging Center, Xinjiang Medical University Affiliated First Hospital, Urumqi, 830011, China.
| | - Qimeng Yang
- School of Information Science and Engineering, Xinjiang University, Urumqi, 830008, China
| | - Xiaodong Duan
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Qiguo Dai
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| |
Collapse
|