1
|
Ye W, Shen B, Tang Q, Fang C, Wang L, Xie L, He Q. Identification of a novel immune infiltration-related gene signature, MCEMP1, for coronary artery disease. PeerJ 2024; 12:e18135. [PMID: 39346078 PMCID: PMC11438437 DOI: 10.7717/peerj.18135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open
Abstract
Background This study aims to identify a novel gene signature for coronary artery disease (CAD), explore the role of immune cell infiltration in CAD pathogenesis, and assess the cell function of mast cell-expressed membrane protein 1 (MCEMP1) in human umbilical vein endothelial cells (HUVECs) treated with oxidized low-density lipoprotein (ox-LDL). Methods To identify differentially expressed genes (DEGs) of CAD, datasets GSE24519 and GSE61145 were downloaded from the Gene Expression Omnibus (GEO) database using the R "limma" package with p < 0.05 and |log2 FC| > 1. Gene ontology (GO) and pathway analyses were conducted to determine the biological functions of DEGs. Hub genes were identified using support vector machine-recursive feature elimination (SVM-RFE) and least absolute shrinkage and selection operator (LASSO). The expression levels of these hub genes in CAD were validated using the GSE113079 dataset. CIBERSORT program was used to quantify the proportion of immune cell infiltration. Western blot assay and qRT-PCR were used to detect the expression of hub genes in ox-LDL-treated HUVECs to validate the bioinformatics results. Knockdown interference sequences for MCEMP1 were synthesized, and cell proliferation and apoptosis were examined using a CCK8 kit and Muse® Cell Analyzer, respectively. The concentrations of IL-1β, IL-6, and TNF-α were measured with respective enzyme-linked immunosorbent assay (ELISA) kits. Results A total of 73 DEGs (four down-regulated genes and 69 up-regulated genes) were identified in the metadata (GSE24519 and GSE61145) cohort. GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis results indicated that these DEGs might be associated with the regulation of platelet aggregation, defense response or response to bacterium, NF-kappa B signaling pathway, and lipid and atherosclerosis. Using SVM-RFE and LASSO, seven hub genes were obtained from the metadata. The upregulated expression of DIRC2 and MCEMP1 in CAD was confirmed in the GSE113079 dataset and in ox-LDL-treated HUVECs. The associations between the two hub genes (DIRC2 and MCEMP1) and the 22 types of immune cell infiltrates in CAD were found. MCEMP1 knockdown accelerated cell proliferation and suppressed cell apoptosis for ox-LDL-treated HUVECs. Additionally, MCEMP1 knockdown appeared to decrease the expression of inflammatory factors IL-1β, IL-6, and TNF-α. Conclusions The results of this study indicate that MCEMP1 may play an important role in CAD pathophysiology.
Collapse
Affiliation(s)
- Wei Ye
- Department of Neonatology, Renmin Hospital of Wuhan University, Wuhan, China
- Department of Cardiology, Renmin Hospital of Wuhan University, Wuhan, China
- Hubei Key Laboratory of Metabolic and Chronic Diseases, Wuhan, China
- Central Laboratory, Renmin Hospital of Wuhan University, Wuhan, China
| | - Bo Shen
- Department of Cardiology, Renmin Hospital of Wuhan University, Wuhan, China
- Hubei Key Laboratory of Metabolic and Chronic Diseases, Wuhan, China
| | - Qizhu Tang
- Department of Cardiology, Renmin Hospital of Wuhan University, Wuhan, China
- Hubei Key Laboratory of Metabolic and Chronic Diseases, Wuhan, China
| | - Chengzhi Fang
- Department of Neonatology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Lei Wang
- Department of Cardiology, HanChuan Hospital, Hanchuan, China
| | - Lili Xie
- Department of Neonatology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Qi He
- Department of Neonatology, Renmin Hospital of Wuhan University, Wuhan, China
| |
Collapse
|
2
|
Zhang Y, Yu W, Zhou S, Xiao J, Zhang X, Yang H, Zhang J. Finding key genes (UBE2T, KIF4A, CDCA3, and CDCA5) co-expressed in hepatitis, cirrhosis and hepatocellular carcinoma based on multiple bioinformatics techniques. BMC Gastroenterol 2024; 24:205. [PMID: 38890649 PMCID: PMC11184838 DOI: 10.1186/s12876-024-03288-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 06/07/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide. Hepatitis B virus (HBV) is one of the major causes of liver cirrhosis (LC) and HCC. Therefore, the discovery of common markers for hepatitis B or LC and HCC is crucial for the prevention of HCC. METHODS Expressed genes for to chronic active hepaititis B (CAH-B), LC and HCC were obtained from the GEO and TCGA databases, and co-expressed genes were screened using Protein-protein interaction (PPI) networks, least absolute shrinkage and selection operator (LASSO), random forest (RF) and support vector machine - recursive feature elimination (SVM-RFE). The prognostic value of genes was assessed using Kaplan-Meier (KM) survival curves. Columnar line plots, calibration curves and receiver operating characteristic (ROC) curves of individual genes were used for evaluation. Validation was performed using GEO datasets. The association of these key genes with HCC clinical features was explored using the UALCAN database ( https://ualcan.path.uab.edu/index.html ). RESULTS Based on WGCNA analysis and TCGA database, the co-expressed genes (565) were screened. Moreover, the five algorithms of MCODE (ClusteringCoefficient, MCC, Degree, MNC, and DMNC) was used to select one of the most important and most closely linked clusters (the top 50 genes ranked). Using, LASSO regression model, RF model and SVM-RFE model, four key genes (UBE2T, KIF4A, CDCA3, and CDCA5) were identified for subsequent research analysis. These 4 genes were highly expressed and associated with poor prognosis and clinical features in HCC patients. CONCLUSION These four key genes (UBE2T, KIF4A, CDCA3, and CDCA5) may be common biomarkers for CAH-B and HCC or LC and HCC, promising to advance our understanding of the molecular basis of CAH-B/LC/HCC progression.
Collapse
Affiliation(s)
- Yingai Zhang
- Central Laboratory, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
- School of Life Sciences, Hainan University, No.58 Renmin Road, Haikou, Hainan, 570228, China
| | - Weiling Yu
- Department of Chemotherapy, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
| | - Shuai Zhou
- Hepatobiliary surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
| | - Jingchuan Xiao
- Central Laboratory, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
| | - Xiaoyu Zhang
- Hepatobiliary surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
| | - Haoliang Yang
- Central Laboratory, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China
| | - Jianquan Zhang
- Hepatobiliary surgery, Affiliated Haikou Hospital of Xiangya Medical College, Central South University, No.43 Renmin Road, Haikou, Hainan, 570208, China.
| |
Collapse
|
3
|
Wu H, Shi W, Wang MD. Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning. BMC Med Inform Decis Mak 2024; 24:137. [PMID: 38802809 PMCID: PMC11129385 DOI: 10.1186/s12911-024-02510-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND Modeling causality through graphs, referred to as causal graph learning, offers an appropriate description of the dynamics of causality. The majority of current machine learning models in clinical decision support systems only predict associations between variables, whereas causal graph learning models causality dynamics through graphs. However, building personalized causal graphs for each individual is challenging due to the limited amount of data available for each patient. METHOD In this study, we present a new algorithmic framework using meta-learning for learning personalized causal graphs in biomedicine. Our framework extracts common patterns from multiple patient graphs and applies this information to develop individualized graphs. In multi-task causal graph learning, the proposed optimized initial guess of shared commonality enables the rapid adoption of knowledge to new tasks for efficient causal graph learning. RESULTS Experiments on one real-world biomedical causal graph learning benchmark data and four synthetic benchmarks show that our algorithm outperformed the baseline methods. Our algorithm can better understand the underlying patterns in the data, leading to more accurate predictions of the causal graph. Specifically, we reduce the structural hamming distance by 50-75%, indicating an improvement in graph prediction accuracy. Additionally, the false discovery rate is decreased by 20-30%, demonstrating that our algorithm made fewer incorrect predictions compared to the baseline algorithms. CONCLUSION To the best of our knowledge, this is the first study to demonstrate the effectiveness of meta-learning in personalized causal graph learning and cause inference modeling for biomedicine. In addition, the proposed algorithm can also be generalized to transnational research areas where integrated analysis is necessary for various distributions of datasets, including different clinical institutions.
Collapse
Affiliation(s)
- Hang Wu
- Coulter Department of Biomedical Engineering, Georgia Insitute of Technology, Atlanta, USA
| | - Wenqi Shi
- Department of Electrical and Computer Engineering, Georgia Insitute of Technology, Atlanta, USA
| | - May D Wang
- Coulter Department of Biomedical Engineering, Georgia Insitute of Technology, Atlanta, USA.
| |
Collapse
|
4
|
Ravichandran P, Parsana P, Keener R, Hansen KD, Battle A. Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene co-expression networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576447. [PMID: 38328080 PMCID: PMC10849507 DOI: 10.1101/2024.01.20.576447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Background Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks. Results We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples. Conclusion This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.
Collapse
Affiliation(s)
| | - Princy Parsana
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kaspar D Hansen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
- Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
5
|
Pavel AB, Garrison C, Luo L, Liu G, Taub D, Xiao J, Juan-Guardela B, Tedrow J, Alekseyev YO, Yang IV, Geraci MW, Sciurba F, Schwartz DA, Kaminski N, Beane J, Spira A, Lenburg ME, Campbell JD. Integrative genetic and genomic networks identify microRNA associated with COPD and ILD. Sci Rep 2023; 13:13076. [PMID: 37567908 PMCID: PMC10421936 DOI: 10.1038/s41598-023-39751-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 07/30/2023] [Indexed: 08/13/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) are clinically and molecularly heterogeneous diseases. We utilized clustering and integrative network analyses to elucidate roles for microRNAs (miRNAs) and miRNA isoforms (isomiRs) in COPD and ILD pathogenesis. Short RNA sequencing was performed on 351 lung tissue samples of COPD (n = 145), ILD (n = 144) and controls (n = 64). Five distinct subclusters of samples were identified including 1 COPD-predominant cluster and 2 ILD-predominant clusters which associated with different clinical measurements of disease severity. Utilizing 262 samples with gene expression and SNP microarrays, we built disease-specific genetic and expression networks to predict key miRNA regulators of gene expression. Members of miR-449/34 family, known to promote airway differentiation by repressing the Notch pathway, were among the top connected miRNAs in both COPD and ILD networks. Genes associated with miR-449/34 members in the disease networks were enriched among genes that increase in expression with airway differentiation at an air-liquid interface. A highly expressed isomiR containing a novel seed sequence was identified at the miR-34c-5p locus. 47% of the anticorrelated predicted targets for this isomiR were distinct from the canonical seed sequence for miR-34c-5p. Overexpression of the canonical miR-34c-5p and the miR-34c-5p isomiR with an alternative seed sequence down-regulated NOTCH1 and NOTCH4. However, only overexpression of the isomiR down-regulated genes involved in Ras signaling such as CRKL and GRB2. Overall, these findings elucidate molecular heterogeneity inherent across COPD and ILD patients and further suggest roles for miR-34c in regulating disease-associated gene-expression.
Collapse
Affiliation(s)
- Ana B Pavel
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA.
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA.
| | - Carly Garrison
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Lingqi Luo
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Gang Liu
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Daniel Taub
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Ji Xiao
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Brenda Juan-Guardela
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - John Tedrow
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
- Norman Regional Medical Center, Norman, Oklahoma, USA
| | - Yuriy O Alekseyev
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Ivana V Yang
- Department of Medicine, University of Colorado, Aurora, CO, USA
| | - Mark W Geraci
- Department of Medicine, University of Colorado, Aurora, CO, USA
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Frank Sciurba
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - David A Schwartz
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Naftali Kaminski
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
- Department of Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jennifer Beane
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
| | - Avrum Spira
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
| | - Marc E Lenburg
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Joshua D Campbell
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA.
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA.
| |
Collapse
|
6
|
Li R, Rozum JC, Quail MM, Qasim MN, Sindi SS, Nobile CJ, Albert R, Hernday AD. Inferring gene regulatory networks using transcriptional profiles as dynamical attractors. PLoS Comput Biol 2023; 19:e1010991. [PMID: 37607190 PMCID: PMC10473541 DOI: 10.1371/journal.pcbi.1010991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/01/2023] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to "static" transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
Collapse
Affiliation(s)
- Ruihao Li
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Jordan C. Rozum
- Department of Systems Science and Industrial Engineering, Binghamton University (State University of New York), Binghamton, New York, United States of America
| | - Morgan M. Quail
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Mohammad N. Qasim
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, California, United States of America
| | - Clarissa J. Nobile
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
- Department of Biology, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
| | - Aaron D. Hernday
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| |
Collapse
|
7
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
8
|
Federico A, Kern J, Varelas X, Monti S. Structure Learning for Gene Regulatory Networks. PLoS Comput Biol 2023; 19:e1011118. [PMID: 37200395 DOI: 10.1371/journal.pcbi.1011118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/31/2023] [Accepted: 04/20/2023] [Indexed: 05/20/2023] Open
Abstract
Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, often referred to as the "small n, large p problem," we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE-Structure Learning for Hierarchical Networks-a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple Markov networks from high-dimensional data at large p/n ratios not previously feasible. We evaluated SHINE on Pan-Cancer data comprising 23 tumor types, and found that learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.
Collapse
Affiliation(s)
- Anthony Federico
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Joseph Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
9
|
Arend M, Yuan Y, Ruiz-Sola MÁ, Omranian N, Nikoloski Z, Petroutsos D. Widening the landscape of transcriptional regulation of green algal photoprotection. Nat Commun 2023; 14:2687. [PMID: 37164999 PMCID: PMC10172295 DOI: 10.1038/s41467-023-38183-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 04/17/2023] [Indexed: 05/12/2023] Open
Abstract
Availability of light and CO2, substrates of microalgae photosynthesis, is frequently far from optimal. Microalgae activate photoprotection under strong light, to prevent oxidative damage, and the CO2 Concentrating Mechanism (CCM) under low CO2, to raise intracellular CO2 levels. The two processes are interconnected; yet, the underlying transcriptional regulators remain largely unknown. Employing a large transcriptomic data compendium of Chlamydomonas reinhardtii's responses to different light and carbon supply, we reconstruct a consensus genome-scale gene regulatory network from complementary inference approaches and use it to elucidate transcriptional regulators of photoprotection. We show that the CCM regulator LCR1 also controls photoprotection, and that QER7, a Squamosa Binding Protein, suppresses photoprotection- and CCM-gene expression under the control of the blue light photoreceptor Phototropin. By demonstrating the existence of regulatory hubs that channel light- and CO2-mediated signals into a common response, our study provides an accessible resource to dissect gene expression regulation in this microalga.
Collapse
Affiliation(s)
- Marius Arend
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institute of Molecular Plant Physiology, 14476, Potsdam, Germany
- Bioinformatics and Mathematical Modeling Department, Center of Plant Systems Biology and Biotechnology, 4000, Plovdiv, Bulgaria
| | - Yizhong Yuan
- University of Grenoble Alpes, CNRS, CEA, INRAE, IRIG-LPCV, 38000, Grenoble, France
| | - M Águila Ruiz-Sola
- University of Grenoble Alpes, CNRS, CEA, INRAE, IRIG-LPCV, 38000, Grenoble, France
- Instituto de Bioquímica Vegetal y Fotosíntesis, Universidad de Sevilla-CSIC, 41092, Sevilla, Spain
| | - Nooshin Omranian
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institute of Molecular Plant Physiology, 14476, Potsdam, Germany
- Bioinformatics and Mathematical Modeling Department, Center of Plant Systems Biology and Biotechnology, 4000, Plovdiv, Bulgaria
| | - Zoran Nikoloski
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institute of Molecular Plant Physiology, 14476, Potsdam, Germany.
- Bioinformatics and Mathematical Modeling Department, Center of Plant Systems Biology and Biotechnology, 4000, Plovdiv, Bulgaria.
| | - Dimitris Petroutsos
- University of Grenoble Alpes, CNRS, CEA, INRAE, IRIG-LPCV, 38000, Grenoble, France.
| |
Collapse
|
10
|
Fabbri L, Garlantézec R, Audouze K, Bustamante M, Carracedo Á, Chatzi L, Ramón González J, Gražulevičienė R, Keun H, Lau CHE, Sabidó E, Siskos AP, Slama R, Thomsen C, Wright J, Lun Yuan W, Casas M, Vrijheid M, Maitre L. Childhood exposure to non-persistent endocrine disrupting chemicals and multi-omic profiles: A panel study. ENVIRONMENT INTERNATIONAL 2023; 173:107856. [PMID: 36867994 DOI: 10.1016/j.envint.2023.107856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Individuals are exposed to environmental pollutants with endocrine disrupting activity (endocrine disruptors, EDCs) and the early stages of life are particularly susceptible to these exposures. Previous studies have focused on identifying molecular signatures associated with EDCs, but none have used repeated sampling strategy and integrated multiple omics. We aimed to identify multi-omic signatures associated with childhood exposure to non-persistent EDCs. METHODS We used data from the HELIX Child Panel Study, which included 156 children aged 6 to 11. Children were followed for one week, in two time periods. Twenty-two non-persistent EDCs (10 phthalate, 7 phenol, and 5 organophosphate pesticide metabolites) were measured in two weekly pools of 15 urine samples each. Multi-omic profiles (methylome, serum and urinary metabolome, proteome) were measured in blood and in a pool urine samples. We developed visit-specific Gaussian Graphical Models based on pairwise partial correlations. The visit-specific networks were then merged to identify reproducible associations. Independent biological evidence was systematically sought to confirm some of these associations and assess their potential health implications. RESULTS 950 reproducible associations were found among which 23 were direct associations between EDCs and omics. For 9 of them, we were able to find corroborating evidence from previous literature: DEP - serotonin, OXBE - cg27466129, OXBE - dimethylamine, triclosan - leptin, triclosan - serotonin, MBzP - Neu5AC, MEHP - cg20080548, oh-MiNP - kynurenine, oxo-MiNP - 5-oxoproline. We used these associations to explore possible mechanisms between EDCs and health outcomes, and found links to health outcomes for 3 analytes: serotonin and kynurenine in relation to neuro-behavioural development, and leptin in relation to obesity and insulin resistance. CONCLUSIONS This multi-omics network analysis at two time points identified biologically relevant molecular signatures related to non-persistent EDC exposure in childhood, suggesting pathways related to neurological and metabolic outcomes.
Collapse
Affiliation(s)
- Lorenzo Fabbri
- ISGlobal, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Ronan Garlantézec
- Univ Rennes, CHU Rennes, Inserm, EHESP, Irset (Institut de recherche en santé environnement et travail), UMR_S 1085, Rennes, France
| | - Karine Audouze
- Université Paris Cité, T3S, INSERM UMR-S 1124, 45 rue des Saints Pères, Paris, France
| | - Mariona Bustamante
- ISGlobal, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain; CIBER Epidemiologa y Salud Pública (CIBERESP), Madrid, Spain
| | - Ángel Carracedo
- Medicine Genomics Group, Centro de Investigación Biomédica en Red Enfermedades Raras (CIBERER), University of Santiago de Compostela, CEGEN-PRB3, Santiago de Compostela, Spain; Galician Foundation of Genomic Medicine, Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS), Servicio Gallego de Salud (SERGAS), Santiago de Compostela, Spain
| | - Leda Chatzi
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, USA
| | - Juan Ramón González
- ISGlobal, Barcelona, Spain; CIBER Epidemiologa y Salud Pública (CIBERESP), Madrid, Spain; Department of Mathematics, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | | | - Hector Keun
- Cancer Metabolism & Systems Toxicology Group, Division of Cancer, Department of Surgery and Cancer & Division of Systems Medicine, Department of Metabolism, Digestion & Reproduction, Imperial College London, Hammersmith Hospital Campus, London, UK
| | - Chung-Ho E Lau
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK; Division of Systems Medicine, Department of Metabolism, Digestion & Reproduction, Imperial College, South Kensington, London, UK
| | - Eduard Sabidó
- Universitat Pompeu Fabra (UPF), Barcelona, Spain; Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Alexandros P Siskos
- Cancer Metabolism & Systems Toxicology Group, Division of Cancer, Department of Surgery and Cancer & Division of Systems Medicine, Department of Metabolism, Digestion & Reproduction, Imperial College London, Hammersmith Hospital Campus, London, UK
| | - Rémy Slama
- Team of Environmental Epidemiology Applied to Reproduction and Respiratory Health, Institute for Advanced Biosciences (IAB), Inserm, CNRS, Université Grenoble Alpes, Grenoble, France
| | - Cathrine Thomsen
- Department of Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Wen Lun Yuan
- Université de Paris, Centre for Research in Epidemiology and Statistics (CRESS), INSERM, INRAE, Paris, France; Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
| | - Maribel Casas
- ISGlobal, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; CIBER Epidemiologa y Salud Pública (CIBERESP), Madrid, Spain
| | - Martine Vrijheid
- ISGlobal, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; CIBER Epidemiologa y Salud Pública (CIBERESP), Madrid, Spain
| | - Léa Maitre
- ISGlobal, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; CIBER Epidemiologa y Salud Pública (CIBERESP), Madrid, Spain.
| |
Collapse
|
11
|
Song Q, Ruffalo M, Bar-Joseph Z. Using single cell atlas data to reconstruct regulatory networks. Nucleic Acids Res 2023; 51:e38. [PMID: 36762475 PMCID: PMC10123116 DOI: 10.1093/nar/gkad053] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 12/16/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)-gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.
Collapse
Affiliation(s)
- Qi Song
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
12
|
Decoding transcriptional regulation via a human gene expression predictor. J Genet Genomics 2023; 50:305-317. [PMID: 36693565 DOI: 10.1016/j.jgg.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/04/2023] [Accepted: 01/10/2023] [Indexed: 01/22/2023]
Abstract
Transcription factors (TFs) regulate cellular activities by controlling gene expression, but a predictive model describing how TFs quantitatively modulate human transcriptomes is lacking. We construct a universal human gene expression predictor and utilize it to decode transcriptional regulation. Using the expression of 1613 TFs, the predictor reconstitutes highly accurate transcriptomes for samples derived from a wide range of tissues and conditions. The broad applicability of the predictor indicates that it recapitulates the quantitative relationships between TFs and target genes ubiquitous across tissues. Significant interacting TF-target gene pairs are extracted from the predictor and enable downstream inference of TF regulators for diverse pathways involved in development, immunity, metabolism, and stress response. A detailed analysis of the hematopoiesis process reveals an atlas of key TFs regulating the development of different hematopoietic cell lineages, and a portion of these TFs are conserved between humans and mice. The results demonstrate that our method is capable of delineating the TFs responsible for fate determination. Compared to other existing tools, our approach shows better performance in recovering the correct TF regulators. Thus, we present a novel approach that can be used to study human transcriptional regulation in general.
Collapse
|
13
|
Liu Q, Li J, Dong M, Liu M, Chai Y. Identification of Gene Regulatory Networks Using Variational Bayesian Inference in the Presence of Missing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:399-409. [PMID: 35061589 DOI: 10.1109/tcbb.2022.3144418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of gene regulatory networks (GRN) from gene expression time series data is a challenge and open problem in system biology. This paper considers the structure inference of GRN from the incomplete and noisy gene expression data, which is a not well-studied issue for GRN inference. In this paper, the dynamical behavior of the gene expression process is described by a stochastic nonlinear state-space model with unknown noise information. A variational Bayesian (VB) framework are proposed to estimate the parameters and gene expression levels simultaneously. One of the advantages of this method is that it can easily handle the missing observations by generating the prediction values. Considering the sparsity of GRN, the smoothed gene data are modeled by the extreme gradient boosting tree, and the regulatory interactions among genes are identified by the importance scores based on the tree model. The proposed method is tested on the artificial DREAM4 datasets and one real gene expression dataset of yeast. The comparative results show that the proposed method can effectively recover the regulatory interactions of GRN in the presence of missing observations and outperforms the existing methods for GRN identification.
Collapse
|
14
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
15
|
Zhang H, Chen J, Tian T. Bayesian Inference of Stochastic Dynamic Models Using Early-Rejection Methods Based on Sequential Stochastic Simulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1484-1494. [PMID: 33216717 DOI: 10.1109/tcbb.2020.3039490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Stochastic modelling is an important method to investigate the functions of noise in a wide range of biological systems. However, the parameter inference for stochastic models is still a challenging problem partially due to the large computing time required for stochastic simulations. To address this issue, we propose a novel early-rejection method by using sequential stochastic simulations. We first show that a large number of stochastic simulations are required to obtain reliable inference results. Instead of generating a large number of simulations for each parameter sample, we propose to generate these simulations in a number of stages. The simulation process will go to the next stage only if the accuracy of simulations at the current stage satisfies a given error criterion. We propose a formula to determine the error criterion and use a stochastic differential equation model to examine the effects of different criteria. Three biochemical network models are used to evaluate the efficiency and accuracy of the proposed method. Numerical results suggest the proposed early-rejection method achieves substantial improvement in the efficiency for the inference of stochastic models.
Collapse
|
16
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
17
|
Emerging Machine Learning Techniques for Modelling Cellular Complex Systems in Alzheimer's Disease. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1338:199-208. [PMID: 34973026 DOI: 10.1007/978-3-030-78775-2_24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We live in the big data era in the biomedical field, where machine learning has a very important contribution to the interpretation of complex biological processes and diseases, since it has the potential to create predictive models from multidimensional data sets. Part of the application of machine learning in biomedical science is to study and model complex cellular systems such as biological networks. In this context, the study of complex diseases, such as Alzheimer's diseases (AD), benefits from established methodologies of network science and machine learning as they offer algorithmic tools and techniques that can address the limitations and challenges of modeling and studying cellular AD-related networks. In this paper we analyze the opportunities and challenges at the intersection of machine learning and network biology and whether this can affect the biological interpretation and clarification of diseases. Specifically, we focus on GRN techniques which through omics data and the use of machine learning techniques can construct a network that captures all the information at the molecular level for the disease under study. We record the emerging machine learning techniques that are focus on ensemble tree-based techniques in the area of classification and regression. Their potential for unraveling the complexity of model cellular systems in complex diseases, such as AD, offers the opportunity for novel machine learning methodologies to decipher the mechanisms of the various AD processes.
Collapse
|
18
|
Biswas S, Acharyya S. Multi-objective Simulated Annealing Variants to Infer Gene Regulatory Network: A Comparative Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2612-2623. [PMID: 32386161 DOI: 10.1109/tcbb.2020.2992304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene Regulatory Network (GRN) is formed due to mutual transcriptional regulation within a set of protein coding genes in cellular context of an organism. Computational inference of GRN is important to understand the behavior of each gene in terms of change in its protein production rate (expression level). As Recurrent Neural Network (RNN) is efficient in GRN modeling, a bi-objective RNN formulation has been applied here. Based on Archived Multi Objective Simulated Annealing (AMOSA), four algorithms, namely, AMOSA Revised (AMOSAR), Modified Freezing based AMOSA (AMOFSA), Tabu based AMOSA (AMOTSA) and Modified Freezing and Tabu based AMOSA (AMOFTSA) have been proposed and applied to RNN (treated as GRN) for parameter learning taking four gene expression time series datasets. Comparative studies on the performance of the algorithms (based on each dataset) have been made in terms of the number of GRNs obtained in the final non-dominated front and the performance metrics, namely, recall, precision and f1 score. Two proposed variants, namely, AMOFSA and AMOTSA have been found competitive in performance. Experimental observations and statistical analysis show that, modified algorithms are better than AMOSAR and the state-of-the-art algorithms in respect of the above-mentioned metrics.
Collapse
|
19
|
Geng H, Wang M, Gong J, Xu Y, Ma S. An Arabidopsis expression predictor enables inference of transcriptional regulators for gene modules. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:597-612. [PMID: 33974299 DOI: 10.1111/tpj.15315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 03/08/2021] [Accepted: 05/05/2021] [Indexed: 06/12/2023]
Abstract
The regulation of gene expression by transcription factors (TFs) has been studied for a long time, but no model that can accurately predict transcriptome profiles based on TF activities currently exists. Here, we developed a computational approach, named EXPLICIT (Expression Prediction via Log-linear Combination of Transcription Factors), to construct a universal predictor for Arabidopsis to predict the expression of 29 182 non-TF genes using 1678 TFs. When applied to RNA-Seq samples from diverse tissues, EXPLICIT generated accurate predicted transcriptomes correlating well with actual expression, with an average correlation coefficient of 0.986. After recapitulating the quantitative relationships between TFs and their target genes, EXPLICIT enabled downstream inference of TF regulators for genes and gene modules functioning in diverse plant pathways, including those involved in suberin, flavonoid, glucosinolate metabolism, lateral root, xylem, secondary cell wall development or endoplasmic reticulum stress response. Our approach showed a better ability to recover the correct TF regulators when compared with existing plant tools, and provides an innovative way to study transcriptional regulation.
Collapse
Affiliation(s)
- Haiying Geng
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Innovation Academy for Seed Design, Chinese Academy of Sciences, Hefei, China
| | - Meng Wang
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Innovation Academy for Seed Design, Chinese Academy of Sciences, Hefei, China
| | - Jiazhen Gong
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Innovation Academy for Seed Design, Chinese Academy of Sciences, Hefei, China
| | - Yupu Xu
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Innovation Academy for Seed Design, Chinese Academy of Sciences, Hefei, China
| | - Shisong Ma
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Innovation Academy for Seed Design, Chinese Academy of Sciences, Hefei, China
- School of Data Science, University of Science and Technology of China, Hefei, China
| |
Collapse
|
20
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
21
|
Integrated Inference of Asymmetric Protein Interaction Networks Using Dynamic Model and Individual Patient Proteomics Data. Symmetry (Basel) 2021. [DOI: 10.3390/sym13061097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Recent advances in experimental biology studies have produced large amount of molecular activity data. In particular, individual patient data provide non-time series information for the molecular activities in disease conditions. The challenge is how to design effective algorithms to infer regulatory networks using the individual patient datasets and consequently address the issue of network symmetry. This work is aimed at developing an efficient pipeline to reverse-engineer regulatory networks based on the individual patient proteomic data. The first step uses the SCOUT algorithm to infer the pseudo-time trajectory of individual patients. Then the path-consistent method with part mutual information is used to construct a static network that contains the potential protein interactions. To address the issue of network symmetry in terms of undirected symmetric network, a dynamic model of ordinary differential equations is used to further remove false interactions to derive asymmetric networks. In this work a dataset from triple-negative breast cancer patients is used to develop a protein-protein interaction network with 15 proteins.
Collapse
|
22
|
Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite. BMC Genomics 2021; 22:387. [PMID: 34039282 PMCID: PMC8152307 DOI: 10.1186/s12864-021-07659-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/28/2021] [Indexed: 11/29/2022] Open
Abstract
Background High-throughput transcriptomic datasets are often examined to discover new actors and regulators of a biological response. To this end, graphical interfaces have been developed and allow a broad range of users to conduct standard analyses from RNA-seq data, even with little programming experience. Although existing solutions usually provide adequate procedures for normalization, exploration or differential expression, more advanced features, such as gene clustering or regulatory network inference, often miss or do not reflect current state of the art methodologies. Results We developed here a user interface called DIANE (Dashboard for the Inference and Analysis of Networks from Expression data) designed to harness the potential of multi-factorial expression datasets from any organisms through a precise set of methods. DIANE interactive workflow provides normalization, dimensionality reduction, differential expression and ontology enrichment. Gene clustering can be performed and explored via configurable Mixture Models, and Random Forests are used to infer gene regulatory networks. DIANE also includes a novel procedure to assess the statistical significance of regulator-target influence measures based on permutations for Random Forest importance metrics. All along the pipeline, session reports and results can be downloaded to ensure clear and reproducible analyses. Conclusions We demonstrate the value and the benefits of DIANE using a recently published data set describing the transcriptional response of Arabidopsis thaliana under the combination of temperature, drought and salinity perturbations. We show that DIANE can intuitively carry out informative exploration and statistical procedures with RNA-Seq data, perform model based gene expression profiles clustering and go further into gene network reconstruction, providing relevant candidate genes or signalling pathways to explore. DIANE is available as a web service (https://diane.bpmp.inrae.fr), or can be installed and locally launched as a complete R package. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07659-2).
Collapse
|
23
|
Kontio JAJ, Pyhäjärvi T, Sillanpää MJ. Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways. PLoS Comput Biol 2021; 17:e1008960. [PMID: 33939702 PMCID: PMC8118548 DOI: 10.1371/journal.pcbi.1008960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 05/13/2021] [Accepted: 04/13/2021] [Indexed: 11/19/2022] Open
Abstract
A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.
Collapse
Affiliation(s)
- Juho A. J. Kontio
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Tanja Pyhäjärvi
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
- Department of Forest Sciences, University of Helsinki, Helsinki, Finland
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- * E-mail:
| |
Collapse
|
24
|
Qian Y, Expert P, Panzarasa P, Barahona M. Geometric graphs from data to aid classification tasks with Graph Convolutional Networks. PATTERNS 2021; 2:100237. [PMID: 33982027 PMCID: PMC8085612 DOI: 10.1016/j.patter.2021.100237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/11/2021] [Accepted: 03/12/2021] [Indexed: 12/02/2022]
Abstract
Traditional classification tasks learn to assign samples to given classes based solely on sample features. This paradigm is evolving to include other sources of information, such as known relations between samples. Here, we show that, even if additional relational information is not available in the dataset, one can improve classification by constructing geometric graphs from the features themselves, and using them within a Graph Convolutional Network. The improvement in classification accuracy is maximized by graphs that capture sample similarity with relatively low edge density. We show that such feature-derived graphs increase the alignment of the data to the ground truth while improving class separation. We also demonstrate that the graphs can be made more efficient using spectral sparsification, which reduces the number of edges while still improving classification performance. We illustrate our findings using synthetic and real-world datasets from various scientific domains. Geometric graphs from data can be used in deep learning to improve classification Optimized graphs align the data to the class labels and enhance class separability Sparsifying the optimized graph can potentially improve classification performance Extensive experiments are performed on datasets from various scientific domains
Supervised classification assigns unseen samples to classes based on their features by learning from examples with known class labels. We show that classification can be improved by using the sample features not only as the basis for classification, but also as a means to construct geometric graphs that encapsulate the closeness between samples. Such feature-derived graphs can be used within graph-based deep-learning models to improve classification. To understand the benefits of these graphs, we show that they align the data to the class labels and enhance class separability. We also demonstrate how to make the graphs sparser, and hence more efficient, while still potentially improving their performance. Our findings are timely given the increasing interest in combining graphs with classification and learning tasks.
Collapse
Affiliation(s)
- Yifan Qian
- School of Business and Management, Queen Mary University of London, London, UK
| | - Paul Expert
- Global Digital Health Unit, School of Public Health, Imperial College London, London, UK.,World Research Hub Initiative, Tokyo Institute of Technology, Tokyo, Japan
| | - Pietro Panzarasa
- School of Business and Management, Queen Mary University of London, London, UK
| | | |
Collapse
|
25
|
Li X, Zhang W, Zhang J, Li G. ModularBoost: an efficient network inference algorithm based on module decomposition. BMC Bioinformatics 2021; 22:153. [PMID: 33761871 PMCID: PMC7992795 DOI: 10.1186/s12859-021-04074-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 03/11/2021] [Indexed: 11/15/2022] Open
Abstract
Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.
Collapse
Affiliation(s)
- Xinyu Li
- State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Zheda Road, 310027, Hangzhou, China
| | - Wei Zhang
- State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Zheda Road, 310027, Hangzhou, China.
| | - Jianming Zhang
- State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Zheda Road, 310027, Hangzhou, China.
| | - Guang Li
- State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Zheda Road, 310027, Hangzhou, China
| |
Collapse
|
26
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
27
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
28
|
Network Analysis of Local Gene Regulators in Arabidopsis thaliana under Spaceflight Stress. COMPUTERS 2021. [DOI: 10.3390/computers10020018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Spaceflight microgravity affects normal plant growth in several ways. The transcriptional dataset of the plant model organism Arabidopsis thaliana grown in the international space station is mined using graph-theoretic network analysis approaches to identify significant gene transcriptions in microgravity essential for the plant’s survival and growth in altered environments. The photosynthesis process is critical for the survival of the plants in spaceflight under different environmentally stressful conditions such as lower levels of gravity, lesser oxygen availability, low atmospheric pressure, and the presence of cosmic radiation. Lasso regression method is used for gene regulatory network inferencing from gene expressions of four different ecotypes of Arabidopsis in spaceflight microgravity related to the photosynthetic process. The individual behavior of hub-genes and stress response genes in the photosynthetic process and their impact on the whole network is analyzed. Logistic regression on centrality measures computed from the networks, including average shortest path, betweenness centrality, closeness centrality, and eccentricity, and the HITS algorithm is used to rank genes and identify interactor or target genes from the networks. Through the hub and authority gene interactions, several biological processes associated with photosynthesis and carbon fixation genes are identified. The altered conditions in spaceflight have made all the ecotypes of Arabidopsis sensitive to dehydration-and-salt stress. The oxidative and heat-shock stress-response genes regulate the photosynthesis genes that are involved in the oxidation-reduction process in spaceflight microgravity, enabling the plant to adapt successfully to the spaceflight environment.
Collapse
|
29
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
30
|
Zaborowski AB, Walther D. Determinants of correlated expression of transcription factors and their target genes. Nucleic Acids Res 2020; 48:11347-11369. [PMID: 33104784 PMCID: PMC7672440 DOI: 10.1093/nar/gkaa927] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 11/14/2022] Open
Abstract
While transcription factors (TFs) are known to regulate the expression of their target genes (TGs), only a weak correlation of expression between TFs and their TGs has generally been observed. As lack of correlation could be caused by additional layers of regulation, the overall correlation distribution may hide the presence of a subset of regulatory TF-TG pairs with tight expression coupling. Using reported regulatory pairs in the plant Arabidopsis thaliana along with comprehensive gene expression information and testing a wide array of molecular features, we aimed to discern the molecular determinants of high expression correlation of TFs and their TGs. TF-family assignment, stress-response process involvement, short genomic distances of the TF-binding sites to the transcription start site of their TGs, few required protein-protein-interaction connections to establish physical interactions between the TF and polymerase-II, unambiguous TF-binding motifs, increased numbers of miRNA target-sites in TF-mRNAs, and a young evolutionary age of TGs were found particularly indicative of high TF-TG correlation. The modulating roles of post-transcriptional, post-translational processes, and epigenetic factors have been characterized as well. Our study reveals that regulatory pairs with high expression coupling are associated with specific molecular determinants.
Collapse
Affiliation(s)
- Adam B Zaborowski
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
31
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
32
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised Learning of Gene Regulatory Networks. ACTA ACUST UNITED AC 2020; 5:e20106. [PMID: 32207875 DOI: 10.1002/cppb.20106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Identifying the entirety of gene regulatory interactions in a biological system offers the possibility to determine the key molecular factors that affect important traits on the level of cells, tissues, and whole organisms. Despite the development of experimental approaches and technologies for identification of direct binding of transcription factors (TFs) to promoter regions of downstream target genes, computational approaches that utilize large compendia of transcriptomics data are still the predominant methods used to predict direct downstream targets of TFs, and thus reconstruct genome-wide gene-regulatory networks (GRNs). These approaches can broadly be categorized into unsupervised and supervised, based on whether data about known, experimentally verified gene-regulatory interactions are used in the process of reconstructing the underlying GRN. Here, we first describe the generic steps of supervised approaches for GRN reconstruction, since they have been recently shown to result in improved accuracy of the resulting networks? We also illustrate how they can be used with data from model organisms to obtain more accurate prediction of gene regulatory interactions. © 2020 The Authors. Basic Protocol 1: Construction of features used in supervised learning of gene regulatory interactions Basic Protocol 2: Learning the non-interacting TF-gene pairs Basic Protocol 3: Learning a classifier for gene regulatory interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.,Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
33
|
Bernal V, Bischoff R, Guryev V, Grzegorczyk M, Horvatovich P. Exact hypothesis testing for shrinkage-based Gaussian graphical models. Bioinformatics 2020; 35:5011-5017. [PMID: 31077287 PMCID: PMC6901079 DOI: 10.1093/bioinformatics/btz357] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/08/2019] [Accepted: 04/26/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the main goals in systems biology is to learn molecular regulatory networks from quantitative profile data. In particular, Gaussian graphical models (GGMs) are widely used network models in bioinformatics where variables (e.g. transcripts, metabolites or proteins) are represented by nodes, and pairs of nodes are connected with an edge according to their partial correlation. Reconstructing a GGM from data is a challenging task when the sample size is smaller than the number of variables. The main problem consists in finding the inverse of the covariance estimator which is ill-conditioned in this case. Shrinkage-based covariance estimators are a popular approach, producing an invertible 'shrunk' covariance. However, a proper significance test for the 'shrunk' partial correlation (i.e. the GGM edges) is an open challenge as a probability density including the shrinkage is unknown. In this article, we present (i) a geometric reformulation of the shrinkage-based GGM, and (ii) a probability density that naturally includes the shrinkage parameter. RESULTS Our results show that the inference using this new 'shrunk' probability density is as accurate as Monte Carlo estimation (an unbiased non-parametric method) for any shrinkage value, while being computationally more efficient. We show on synthetic data how the novel test for significance allows an accurate control of the Type I error and outperforms the network reconstruction obtained by the widely used R package GeneNet. This is further highlighted in two gene expression datasets from stress response in Eschericha coli, and the effect of influenza infection in Mus musculus. AVAILABILITY AND IMPLEMENTATION https://github.com/V-Bernal/GGM-Shrinkage. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Bernal
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands.,Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen AV, The Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| |
Collapse
|
34
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
35
|
Hu X, Hu Y, Wu F, Leung RWT, Qin J. Integration of single-cell multi-omics for gene regulatory network inference. Comput Struct Biotechnol J 2020; 18:1925-1938. [PMID: 32774787 PMCID: PMC7385034 DOI: 10.1016/j.csbj.2020.06.033] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 06/17/2020] [Accepted: 06/20/2020] [Indexed: 12/20/2022] Open
Abstract
The advancement of single-cell sequencing technology in recent years has provided an opportunity to reconstruct gene regulatory networks (GRNs) with the data from thousands of single cells in one sample. This uncovers regulatory interactions in cells and speeds up the discoveries of regulatory mechanisms in diseases and biological processes. Therefore, more methods have been proposed to reconstruct GRNs using single-cell sequencing data. In this review, we introduce technologies for sequencing single-cell genome, transcriptome, and epigenome. At the same time, we present an overview of current GRN reconstruction strategies utilizing different single-cell sequencing data. Bioinformatics tools were grouped by their input data type and mathematical principles for reader's convenience, and the fundamental mathematics inherent in each group will be discussed. Furthermore, the adaptabilities and limitations of these different methods will also be summarized and compared, with the hope to facilitate researchers recognizing the most suitable tools for them.
Collapse
Affiliation(s)
- Xinlin Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Yaohua Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Fanjie Wu
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Ricky Wai Tak Leung
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|
36
|
Song Q, Lee J, Akter S, Rogers M, Grene R, Li S. Prediction of condition-specific regulatory genes using machine learning. Nucleic Acids Res 2020; 48:e62. [PMID: 32329779 PMCID: PMC7293043 DOI: 10.1093/nar/gkaa264] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/19/2020] [Accepted: 04/20/2020] [Indexed: 12/31/2022] Open
Abstract
Recent advances in genomic technologies have generated data on large-scale protein-DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5-25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.
Collapse
Affiliation(s)
- Qi Song
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
| | - Jiyoung Lee
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
| | - Shamima Akter
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| | - Matthew Rogers
- Department of Statistics. Virginia Tech., Blacksburg, VA 24061, USA
| | - Ruth Grene
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| | - Song Li
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| |
Collapse
|
37
|
Clemente-Moreno MJ, Omranian N, Sáez PL, Figueroa CM, Del-Saz N, Elso M, Poblete L, Orf I, Cuadros-Inostroza A, Cavieres LA, Bravo L, Fernie AR, Ribas-Carbó M, Flexas J, Nikoloski Z, Brotman Y, Gago J. Low-temperature tolerance of the Antarctic species Deschampsia antarctica: A complex metabolic response associated with nutrient remobilization. PLANT, CELL & ENVIRONMENT 2020; 43:1376-1393. [PMID: 32012308 DOI: 10.1111/pce.13737] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 01/19/2020] [Accepted: 01/21/2020] [Indexed: 06/10/2023]
Abstract
The species Deschampsia antarctica (DA) is one of the only two native vascular species that live in Antarctica. We performed ecophysiological, biochemical, and metabolomic studies to investigate the responses of DA to low temperature. In parallel, we assessed the responses in a non-Antarctic reference species (Triticum aestivum [TA]) from the same family (Poaceae). At low temperature (4°C), both species showed lower photosynthetic rates (reductions were 70% and 80% for DA and TA, respectively) and symptoms of oxidative stress but opposite responses of antioxidant enzymes (peroxidases and catalase). We employed fused least absolute shrinkage and selection operator statistical modelling to associate the species-dependent physiological and antioxidant responses to primary metabolism. Model results for DA indicated associations with osmoprotection, cell wall remodelling, membrane stabilization, and antioxidant secondary metabolism (synthesis of flavonols and phenylpropanoids), coordinated with nutrient mobilization from source to sink tissues (confirmed by elemental analysis), which were not observed in TA. The metabolic behaviour of DA, with significant changes in particular metabolites, was compared with a newly compiled multispecies dataset showing a general accumulation of metabolites in response to low temperatures. Altogether, the responses displayed by DA suggest a compromise between catabolism and maintenance of leaf functionality.
Collapse
Affiliation(s)
- María José Clemente-Moreno
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Nooshin Omranian
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Patricia L Sáez
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | | | - Néstor Del-Saz
- Laboratorio de Fisiología Vegetal, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, Concepción, Chile
| | - Mhartyn Elso
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | - Leticia Poblete
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, Concepción, Chile
| | - Isabel Orf
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel
| | | | - Lohengrin A Cavieres
- ECOBIOSIS, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción and Instituto de Ecología y Biodiversidad-IEB, Concepción, Chile
| | - León Bravo
- Lab. de Fisiología y Biología Molecular Vegetal, Dpt. de Cs. Agronómicas y Recursos Naturales, Facultad de Cs. Agropecuarias y Forestales, Instituto de Agroindustria, & Center of Plant, Soil Interaction and Natural Resources Biotechnology, Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, Temuco, Chile
| | - Alisdair R Fernie
- Central Metabolism Group, Molecular Physiology Department, Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam, Germany
| | - Miquel Ribas-Carbó
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Jaume Flexas
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
- Center of Plant System Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Yariv Brotman
- Department of Life Sciences, Ben Gurion University of the Negev, Beersheva, Israel
| | - Jorge Gago
- Research Group on Plant Biology under Mediterranean Conditions, Universitat de les Illes Balears (UIB)-Instituto de Agroecología y Economía del Agua (INAGEA), Palma de Mallorca, Spain
| |
Collapse
|
38
|
Shi J, Zhao J, Liu X, Chen L, Li T. Quantifying Direct Dependencies in Biological Networks by Multiscale Association Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:449-458. [PMID: 29994264 DOI: 10.1109/tcbb.2018.2846648] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Partial correlation (PC) or conditional mutual information (CMI) is widely used in detecting direct dependencies between the observed variables in biological networks by eliminating indirect correlations/associations, but it fails whenever there are some strong correlations in a network. In this paper, we theoretically develop a multiscale association analysis to overcome this flaw. We propose a new measure, partial association (PA), based on the multiscale conditional mutual information. We show that linear PA and nonlinear PA have clear advantages over PC and CMI from both theoretical and computational aspects. Both simulated models and real omics datasets demonstrate that PA is superior to PC and CMI in terms of accuracy, and is a powerful tool to identify the direct associations or reconstruct molecular networks based on the observed data. Survival and functional analyses of the hub genes in the gene networks reconstructed from TCGA data for different cancers also validated the effectiveness of our method.
Collapse
|
39
|
Li Y, Liu D, Li T, Zhu Y. Bayesian differential analysis of gene regulatory networks exploiting genetic perturbations. BMC Bioinformatics 2020; 21:12. [PMID: 31918656 PMCID: PMC6953167 DOI: 10.1186/s12859-019-3314-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene regulatory networks (GRNs) can be inferred from both gene expression data and genetic perturbations. Under different conditions, the gene data of the same gene set may be different from each other, which results in different GRNs. Detecting structural difference between GRNs under different conditions is of great significance for understanding gene functions and biological mechanisms. RESULTS In this paper, we propose a Bayesian Fused algorithm to jointly infer differential structures of GRNs under two different conditions. The algorithm is developed for GRNs modeled with structural equation models (SEMs), which makes it possible to incorporate genetic perturbations into models to improve the inference accuracy, so we name it BFDSEM. Different from the naive approaches that separately infer pair-wise GRNs and identify the difference from the inferred GRNs, we first re-parameterize the two SEMs to form an integrated model that takes full advantage of the two groups of gene data, and then solve the re-parameterized model by developing a novel Bayesian fused prior following the criterion that separate GRNs and differential GRN are both sparse. CONCLUSIONS Computer simulations are run on synthetic data to compare BFDSEM to two state-of-the-art joint inference algorithms: FSSEM and ReDNet. The results demonstrate that the performance of BFDSEM is comparable to FSSEM, and is generally better than ReDNet. The BFDSEM algorithm is also applied to a real data set of lung cancer and adjacent normal tissues, the yielded normal GRN and differential GRN are consistent with the reported results in previous literatures. An open-source program implementing BFDSEM is freely available in Additional file 1.
Collapse
Affiliation(s)
- Yan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Dayou Liu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Tengfei Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Yungang Zhu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| |
Collapse
|
40
|
Buldum G, Tsipa A, Mantalaris A. Linking Engineered Gene Circuit Kinetic Modeling to Cellulose Biosynthesis Prediction in Escherichia coli: Toward Bioprocessing of Microbial Cell Factories. Ind Eng Chem Res 2020. [DOI: 10.1021/acs.iecr.9b05847] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Gizem Buldum
- Biological Systems Engineering Laboratory (BSEL), Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Argyro Tsipa
- Biological Systems Engineering Laboratory (BSEL), Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Athanasios Mantalaris
- Biological Systems Engineering Laboratory (BSEL), Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30322, United States
| |
Collapse
|
41
|
Clemente-Moreno MJ, Omranian N, Sáez P, Figueroa CM, Del-Saz N, Elso M, Poblete L, Orf I, Cuadros-Inostroza A, Cavieres L, Bravo L, Fernie A, Ribas-Carbó M, Flexas J, Nikoloski Z, Brotman Y, Gago J. Cytochrome respiration pathway and sulphur metabolism sustain stress tolerance to low temperature in the Antarctic species Colobanthus quitensis. THE NEW PHYTOLOGIST 2020; 225:754-768. [PMID: 31489634 DOI: 10.1111/nph.16167] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 08/22/2019] [Indexed: 05/28/2023]
Abstract
Understanding the strategies employed by plant species that live in extreme environments offers the possibility to discover stress tolerance mechanisms. We studied the physiological, antioxidant and metabolic responses to three temperature conditions (4, 15, and 23°C) of Colobanthus quitensis (CQ), one of the only two native vascular species in Antarctica. We also employed Dianthus chinensis (DC), to assess the effects of the treatments in a non-Antarctic species from the same family. Using fused LASSO modelling, we associated physiological and biochemical antioxidant responses with primary metabolism. This approach allowed us to highlight the metabolic pathways driving the response specific to CQ. Low temperature imposed dramatic reductions in photosynthesis (up to 88%) but not in respiration (sustaining rates of 3.0-4.2 μmol CO2 m-2 s-1 ) in CQ, and no change in the physiological stress parameters was found. Its notable antioxidant capacity and mitochondrial cytochrome respiratory activity (20 and two times higher than DC, respectively), which ensure ATP production even at low temperature, was significantly associated with sulphur-containing metabolites and polyamines. Our findings potentially open new biotechnological opportunities regarding the role of antioxidant compounds and respiratory mechanisms associated with sulphur metabolism in stress tolerance strategies to low temperature.
Collapse
Affiliation(s)
- María José Clemente-Moreno
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Nooshin Omranian
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam-Golm, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - Patricia Sáez
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Carlos María Figueroa
- Instituto de Agrobiotecnología del Litoral, UNL, CONICET, FBCB, 3000, Santa Fe, Argentina
| | - Néstor Del-Saz
- Laboratorio de Fisiología Vegetal, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, 4030000, Concepción, Chile
| | - Mhartyn Elso
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Leticia Poblete
- Laboratorio Cultivo de Tejidos Vegetales, Centro de Biotecnología, Departamento de Silvicultura, Facultad de Ciencias Forestales, Universidad de Concepción, 4030000, Concepción, Chile
| | - Isabel Orf
- Department of Life Sciences, Ben Gurion University of the Negev, 8410501, Beer Sheva, Israel
| | | | - Lohengrin Cavieres
- ECOBIOSIS, Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, 4030000, Concepción, Chile
| | - León Bravo
- Laboratorio de Fisiología y Biología Molecular Vegetal, Departamento de Cs. Agronómicas y Recursos Naturales, Facultad de Ciencias Agropecuarias y Forestales, Instituto de Agroindustria, Universidad de La Frontera, Temuco, Chile
- Center of Plant, Soil Interaction and Natural Resources Biotechnology, Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, 4811230, Temuco, Chile
| | - Alisdair Fernie
- Central Metabolism Group, Molecular Physiology Department, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Golm, Germany
| | - Miquel Ribas-Carbó
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Jaume Flexas
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476, Potsdam-Golm, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
- Center of Plant System Biology and Biotechnology (CPSBB), 4000, Plovdiv, Bulgaria
| | - Yariv Brotman
- Department of Life Sciences, Ben Gurion University of the Negev, 8410501, Beer Sheva, Israel
| | - Jorge Gago
- Research Group on Plant Biology under Mediterranean Conditions, Instituto de Agroecología y Economía del Agua (INAGEA), Universitat de les Illes Balears (UIB), cta. Valldemossa km 7,5, 07122, Palma de Mallorca, Spain
| |
Collapse
|
42
|
|
43
|
Sun J, Wang J, Yuan X, Wu X, Sui T, Wu A, Cheng G, Jiang T. Regulation of Early Host Immune Responses Shapes the Pathogenicity of Avian Influenza A Virus. Front Microbiol 2019; 10:2007. [PMID: 31572308 PMCID: PMC6749051 DOI: 10.3389/fmicb.2019.02007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 08/15/2019] [Indexed: 01/16/2023] Open
Abstract
Avian influenza A viruses (IAV) can cross the species barrier and cause disease in humans. Understanding the pathogenesis of avian IAV remains a challenge. Interferon-mediated antiviral responses and multiple cytokines production are important host cellular antiviral immunity against IAV infection. To elucidate the pathogenicity of avian IAV, a system approach was adopted to investigate dysregulation of the two host cellular antiviral immune responses in contrast with human IAV. As a result, we revealed that avian IAV not only disrupted normal early host cellular interferon-mediated antiviral responses, but also caused abnormal cytokines production through different pathways. For avian IAV infection, dysregulation of STAT2 was mainly responsible for abnormal cellular interferon-mediated antiviral responses, and IRF5 and NFKB1 played crucial roles in unusual cytokines production. In contrast, for human IAV infection, IRF1, IRF7, and STAT1 contributed to cellular cytokines production. Furthermore, differential activation of pattern recognition receptors (PRRs) likely led to avian IAV-related abnormal early host cellular antiviral immunity, where TLR7 and RIG-I were activated by avian and human IAV, respectively. Finally, a pathogenesis model was proposed that combined of early host cellular interferon-mediated antiviral responses with cytokines production could partly explain the pathogenicity of avian IAV. In conclusion, our study provides a new perspective of the pathogenesis of avian IAV, which will be helpful in preventing their infections in the future.
Collapse
Affiliation(s)
- Jiya Sun
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jingfeng Wang
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xuye Yuan
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiangwei Wu
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Tianqi Sui
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Aiping Wu
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Genhong Cheng
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.,Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Taijiao Jiang
- Suzhou Institute of Systems Medicine, Suzhou, China.,Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
44
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
45
|
Young WC, Yeung KY, Raftery AE. Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks. STAT MODEL 2019; 19:444-465. [PMID: 33824624 DOI: 10.1177/1471082x18776577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.
Collapse
Affiliation(s)
- William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
46
|
Ahn H, Jo K, Jeong D, Pak M, Hur J, Jung W, Kim S. PropaNet: Time-Varying Condition-Specific Transcriptional Network Construction by Network Propagation. FRONTIERS IN PLANT SCIENCE 2019; 10:698. [PMID: 31258543 PMCID: PMC6587906 DOI: 10.3389/fpls.2019.00698] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 05/09/2019] [Indexed: 06/09/2023]
Abstract
Transcription factor (TF) has a significant influence on the state of a cell by regulating multiple down-stream genes. Thus, experimental and computational biologists have made great efforts to construct TF gene networks for regulatory interactions between TFs and their target genes. Now, an important research question is how to utilize TF networks to investigate the response of a plant to stress at the transcription control level using time-series transcriptome data. In this article, we present a new computational network, PropaNet, to investigate dynamics of TF networks from time-series transcriptome data using two state-of-the-art network analysis techniques, influence maximization and network propagation. PropaNet uses the influence maximization technique to produce a ranked list of TFs, in the order of TF that explains differentially expressed genes (DEGs) better at each time point. Then, a network propagation technique is used to select a group of TFs that explains DEGs best as a whole. For the analysis of Arabidopsis time series datasets from AtGenExpress, we used PlantRegMap as a template TF network and performed PropaNet analysis to investigate transcriptional dynamics of Arabidopsis under cold and heat stress. The time varying TF networks showed that Arabidopsis responded to cold and heat stress quite differently. For cold stress, bHLH and bZIP type TFs were the first responding TFs and the cold signal influenced histone variants, various genes involved in cell architecture, osmosis and restructuring of cells. However, the consequences of plants under heat stress were up-regulation of genes related to accelerating differentiation and starting re-differentiation. In terms of energy metabolism, plants under heat stress show elevated metabolic process and resulting in an exhausted status. We believe that PropaNet will be useful for the construction of condition-specific time-varying TF network for time-series data analysis in response to stress. PropaNet is available at http://biohealth.snu.ac.kr/software/PropaNet.
Collapse
Affiliation(s)
- Hongryul Ahn
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Kyuri Jo
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Minwoo Pak
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Jihye Hur
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| |
Collapse
|
47
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
48
|
Masnadi-Shirazi M, Maurya MR, Pao G, Ke E, Verma IM, Subramaniam S. Time varying causal network reconstruction of a mouse cell cycle. BMC Bioinformatics 2019; 20:294. [PMID: 31142274 PMCID: PMC6542064 DOI: 10.1186/s12859-019-2895-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 05/13/2019] [Indexed: 12/21/2022] Open
Abstract
Background Biochemical networks are often described through static or time-averaged measurements of the component macromolecules. Temporal variation in these components plays an important role in both describing the dynamical nature of the network as well as providing insights into causal mechanisms. Few methods exist, specifically for systems with many variables, for analyzing time series data to identify distinct temporal regimes and the corresponding time-varying causal networks and mechanisms. Results In this study, we use well-constructed temporal transcriptional measurements in a mammalian cell during a cell cycle, to identify dynamical networks and mechanisms describing the cell cycle. The methods we have used and developed in part deal with Granger causality, Vector Autoregression, Estimation Stability with Cross Validation and a nonparametric change point detection algorithm that enable estimating temporally evolving directed networks that provide a comprehensive picture of the crosstalk among different molecular components. We applied our approach to RNA-seq time-course data spanning nearly two cell cycles from Mouse Embryonic Fibroblast (MEF) primary cells. The change-point detection algorithm is able to extract precise information on the duration and timing of cell cycle phases. Using Least Absolute Shrinkage and Selection Operator (LASSO) and Estimation Stability with Cross Validation (ES-CV), we were able to, without any prior biological knowledge, extract information on the phase-specific causal interaction of cell cycle genes, as well as temporal interdependencies of biological mechanisms through a complete cell cycle. Conclusions The temporal dependence of cellular components we provide in our model goes beyond what is known in the literature. Furthermore, our inference of dynamic interplay of multiple intracellular mechanisms and their temporal dependence on one another can be used to predict time-varying cellular responses, and provide insight on the design of precise experiments for modulating the regulation of the cell cycle. Electronic supplementary material The online version of this article (10.1186/s12859-019-2895-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maryam Masnadi-Shirazi
- Department of Electrical and Computer Engineering and Bioengineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Mano R Maurya
- Department of Bioengineering and San Diego Supercomputer center, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Gerald Pao
- Salk institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Eugene Ke
- Salk institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Inder M Verma
- Salk institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA, 92037, USA
| | - Shankar Subramaniam
- Department of Bioengineering, Departments of Computer Science and Engineering, Cellular and Molecular Medicine, and the Graduate Program in Bioinformatics, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
49
|
Wang C, Gao F, Giannakis GB, D'Urso G, Cai X. Efficient proximal gradient algorithm for inference of differential gene networks. BMC Bioinformatics 2019; 20:224. [PMID: 31046666 PMCID: PMC6498668 DOI: 10.1186/s12859-019-2749-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 03/18/2019] [Indexed: 02/07/2023] Open
Abstract
Background Gene networks in living cells can change depending on various conditions such as caused by different environments, tissue types, disease states, and development stages. Identifying the differential changes in gene networks is very important to understand molecular basis of various biological process. While existing algorithms can be used to infer two gene networks separately from gene expression data under two different conditions, and then to identify network changes, such an approach does not exploit the similarity between two gene networks, and it is thus suboptimal. A desirable approach would be clearly to infer two gene networks jointly, which can yield improved estimates of network changes. Results In this paper, we developed a proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers two gene networks under different conditions and then identifies changes in the network structure. Computer simulations demonstrated that our ProGAdNet outperformed existing algorithms in terms of inference accuracy, and was much faster than a similar approach for joint inference of gene networks. Gene expression data of breast tumors and normal tissues in the TCGA database were analyzed with our ProGAdNet, and revealed that 268 genes were involved in the changed network edges. Gene set enrichment analysis identified a significant number of gene sets related to breast cancer or other types of cancer that are enriched in this set of 268 genes. Network analysis of the kidney cancer data in the TCGA database with ProGAdNet also identified a set of genes involved in network changes, and the majority of the top genes identified have been reported in the literature to be implicated in kidney cancer. These results corroborated that the gene sets identified by ProGAdNet were very informative about the cancer disease status. A software package implementing the ProGAdNet, computer simulations, and real data analysis is available as Additional file 1. Conclusion With its superior performance over existing algorithms, ProGAdNet provides a valuable tool for finding changes in gene networks, which may aid the discovery of gene-gene interactions changed under different conditions. Electronic supplementary material The online version of this article (10.1186/s12859-019-2749-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chen Wang
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA
| | - Feng Gao
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA
| | - Georgios B Giannakis
- Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, 55455, MN, USA
| | - Gennaro D'Urso
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, 33136, FL, USA
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA. .,Sylvester Comprehensive Cancer Center, University of Miami, Miami, 33136, FL, USA.
| |
Collapse
|
50
|
Haque S, Ahmad JS, Clark NM, Williams CM, Sozzani R. Computational prediction of gene regulatory networks in plant growth and development. CURRENT OPINION IN PLANT BIOLOGY 2019; 47:96-105. [PMID: 30445315 DOI: 10.1016/j.pbi.2018.10.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/05/2018] [Accepted: 10/18/2018] [Indexed: 05/22/2023]
Abstract
Plants integrate a wide range of cellular, developmental, and environmental signals to regulate complex patterns of gene expression. Recent advances in genomic technologies enable differential gene expression analysis at a systems level, allowing for improved inference of the network of regulatory interactions between genes. These gene regulatory networks, or GRNs, are used to visualize the causal regulatory relationships between regulators and their downstream target genes. Accordingly, these GRNs can represent spatial, temporal, and/or environmental regulations and can identify functional genes. This review summarizes recent computational approaches applied to different types of gene expression data to infer GRNs in the context of plant growth and development. Three stages of GRN inference are described: first, data collection and analysis based on the dataset type; second, network inference application based on data availability and proposed hypotheses; and third, validation based on in silico, in vivo, and in planta methods. In addition, this review relates data collection strategies to biological questions, organizes inference algorithms based on statistical methods and data types, discusses experimental design considerations, and provides guidelines for GRN inference with an emphasis on the benefits of integrative approaches, especially when a priori information is limited. Finally, this review concludes that computational frameworks integrating large-scale heterogeneous datasets are needed for a more accurate (e.g. fewer false interactions), detailed (e.g. discrimination between direct versus indirect interactions), and comprehensive (e.g. genetic regulation under various conditions and spatial locations) inference of GRNs.
Collapse
Affiliation(s)
- Samiul Haque
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
| | - Jabeen S Ahmad
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Natalie M Clark
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Cranos M Williams
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA.
| | - Rosangela Sozzani
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|