1
|
Zhang Y, Zhao J, Sun X, Zheng Y, Chen T, Wang Z. Leveraging independent component analysis to unravel transcriptional regulatory networks: A critical review and future directions. Biotechnol Adv 2025; 78:108479. [PMID: 39577573 DOI: 10.1016/j.biotechadv.2024.108479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/24/2024]
Abstract
Transcriptional regulatory networks (TRNs) play a crucial role in exploring microbial life activities and complex regulatory mechanisms. The comprehensive reconstruction of TRNs requires the integration of large-scale experimental data, which poses significant challenges due to the complexity of regulatory relationships. The application of machine learning tools, such as clustering analysis, has been employed to investigate TRNs, but these methods have limitations in capturing both global and local co-expression effects. In contrast, Independent Component Analysis (ICA) has emerged as a powerful analysis algorithm for modularizing independently regulated gene sets in TRNs, allowing it to account for both global and local co-expression effects. In this review, we comprehensively summarize the application of ICA in unraveling TRNs and highlight the research progress in three key aspects: (1) extending TRNs with iModulon analysis; (2) elucidating the regulatory mechanisms triggered by environmental perturbation; and (3) exploring the mechanisms of transcriptional regulation triggered by changes in microbial physiological state. At the end of this review, we also address the challenges facing ICA in TRN analysis and outline future research directions to promote the advancement of ICA-based transcriptomics analysis in biotechnology and related fields.
Collapse
Affiliation(s)
- Yuhan Zhang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Jianxiao Zhao
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Xi Sun
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China
| | - Yangyang Zheng
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Tao Chen
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Zhiwen Wang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China.
| |
Collapse
|
2
|
Zhang L, Fang Y, Shi M, Ren K, Guan X, Younas W, Cheng Y, Zhang W, Wang Y, Xia XQ. Gonadal expression profiles reveal the underlying mechanisms of temperature effects on sex determination in the large-scale loach (Paramisgurnus dabryanus). Anim Reprod Sci 2025; 272:107661. [PMID: 39644765 DOI: 10.1016/j.anireprosci.2024.107661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 11/24/2024] [Accepted: 11/30/2024] [Indexed: 12/09/2024]
Abstract
The sex determination mechanism in large-scale loach (Paramisgurnus dabryanus) follows a ZZ/ZW system, with sexual differentiation regulated by both genotypic factors and temperature effects (GSD+TSD), where elevated temperatures result in a higher proportion of males. Currently, research on the sex determination mechanisms in large-scale loach is limited, and the specific gene expression profiles and the role of temperature in influencing sex remain largely unknown. This study investigated the impact of temperature on the sex ratio in cultured populations of the large-scale loach, and then identified a female-specific genetic marker by whole genome sequencing, facilitating the distinguishing of females, males, and pseudo-males within this population. Transcriptomic analysis was subsequently performed on these groups, and the data revealed a similar expression pattern between pseudo-males and true-males. The research combined differential expression analysis with WGCNA to construct a regulatory network of nine sex differentiation-related genes (SDG) (map3k4, trpv4, hsd17b12a, wt1, ar, dmrt1, bcar1, sox9a, cyp17a1), indicating that sex differentiation in large-scale loach is probably driven by the regulation of male-related genes. The transcriptomic analysis suggested that temperature significantly modified the expression of SDG in the ovaries, while in the testes, it predominantly affects metabolism-related pathways. We established a temperature-sensitive gene network in females, based on the correlation between gene expression and temperature, as well as the number of co-regulated genes in female data. We propose that, with increasing temperature, wt1 serves as a central regulator, leading to the down-regulation of foxl2a, cyp19a1a, and the cholesterol biosynthesis-related gene sqlea, ultimately resulting in the development of pseudo-males.
Collapse
Affiliation(s)
- Lei Zhang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yutong Fang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Mijuan Shi
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China.
| | - Keyi Ren
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Xin Guan
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Waqar Younas
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yingyin Cheng
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Wanting Zhang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yaping Wang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiao-Qin Xia
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
3
|
Jin H, Kim W, Yuan M, Li X, Yang H, Li M, Shi M, Turkez H, Uhlen M, Zhang C, Mardinoglu A. Identification of SPP1 + macrophages as an immune suppressor in hepatocellular carcinoma using single-cell and bulk transcriptomics. Front Immunol 2024; 15:1446453. [PMID: 39691723 PMCID: PMC11649653 DOI: 10.3389/fimmu.2024.1446453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 11/19/2024] [Indexed: 12/19/2024] Open
Abstract
Introduction Macrophages and T cells play crucial roles in liver physiology, but their functional diversity in hepatocellular carcinoma (HCC) remains largely unknown. Methods Two bulk RNA-sequencing (RNA-seq) cohorts for HCC were analyzed using gene co-expression network analysis. Key gene modules and networks were mapped to single-cell RNA-sequencing (scRNA-seq) data of HCC. Cell type fraction of bulk RNA-seq data was estimated by deconvolution approach using single-cell RNA-sequencing data as a reference. Survival analysis was carried out to estimate the prognosis of different immune cell types in bulk RNA-seq cohorts. Cell-cell interaction analysis was performed to identify potential links between immune cell types in HCC. Results In this study, we analyzed RNA-seq data from two large-scale HCC cohorts, revealing a major and consensus gene co-expression cluster with significant implications for immunosuppression. Notably, these genes exhibited higher enrichment in liver macrophages than T cells, as confirmed by scRNA-seq data from HCC patients. Integrative analysis of bulk and single-cell RNA-seq data pinpointed SPP1 + macrophages as an unfavorable cell type, while VCAN + macrophages, C1QA + macrophages, and CD8 + T cells were associated with a more favorable prognosis for HCC patients. Subsequent scRNA-seq investigations and in vitro experiments elucidated that SPP1, predominantly secreted by SPP1 + macrophages, inhibits CD8 + T cell proliferation. Finally, targeting SPP1 in tumor-associated macrophages through inhibition led to a shift towards a favorable phenotype. Discussion This study underpins the potential of SPP1 as a translational target in immunotherapy for HCC.
Collapse
Affiliation(s)
- Han Jin
- Central Laboratory, Tianjin Medical University General Hospital, Tianjin, China
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Woonghee Kim
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Meng Yuan
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Xiangyu Li
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Hong Yang
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Mengzhen Li
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Mengnan Shi
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Hasan Turkez
- Department of Medical Biology, Faculty of Medicine, Atatürk University, Erzurum, Türkiye
| | - Mathias Uhlen
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Cheng Zhang
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Adil Mardinoglu
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, United Kingdom
| |
Collapse
|
4
|
Huang Y, Huang S, Zhang XF, Ou-Yang L, Liu C. NJGCG: A node-based joint Gaussian copula graphical model for gene networks inference across multiple states. Comput Struct Biotechnol J 2024; 23:3199-3210. [PMID: 39263209 PMCID: PMC11388165 DOI: 10.1016/j.csbj.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 08/05/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024] Open
Abstract
Inferring the interactions between genes is essential for understanding the mechanisms underlying biological processes. Gene networks will change along with the change of environment and state. The accumulation of gene expression data from multiple states makes it possible to estimate the gene networks in various states based on computational methods. However, most existing gene network inference methods focus on estimating a gene network from a single state, ignoring the similarities between networks in different but related states. Moreover, in addition to individual edges, similarities and differences between different networks may also be driven by hub genes. But existing network inference methods rarely consider hub genes, which affects the accuracy of network estimation. In this paper, we propose a novel node-based joint Gaussian copula graphical (NJGCG) model to infer multiple gene networks from gene expression data containing heterogeneous samples jointly. Our model can handle various gene expression data with missing values. Furthermore, a tree-structured group lasso penalty is designed to identify the common and specific hub genes in different gene networks. Simulation studies show that our proposed method outperforms other compared methods in all cases. We also apply NJGCG to infer the gene networks for different stages of differentiation in mouse embryonic stem cells and different subtypes of breast cancer, and explore changes in gene networks across different stages of differentiation or different subtypes of breast cancer. The common and specific hub genes in the estimated gene networks are closely related to stem cell differentiation processes and heterogeneity within breast cancers.
Collapse
Affiliation(s)
- Yun Huang
- Department of Geriatrics, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Clinical Research Center for Geriatric Hypertension Disease of Fujian province, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| | - Sen Huang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Chen Liu
- Department of Oncology, Molecular Oncology Research Institute, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Department of Oncology, National Regional Medical Center, Binhai Campus of The First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
- Fujian Key Laboratory of Precision Medicine for Cancer, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| |
Collapse
|
5
|
Grützmann K, Kraft T, Meinhardt M, Meier F, Westphal D, Seifert M. Network-based analysis of heterogeneous patient-matched brain and extracranial melanoma metastasis pairs reveals three homogeneous subgroups. Comput Struct Biotechnol J 2024; 23:1036-1050. [PMID: 38464935 PMCID: PMC10920107 DOI: 10.1016/j.csbj.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 03/12/2024] Open
Abstract
Melanoma, the deadliest form of skin cancer, can metastasize to different organs. Molecular differences between brain and extracranial melanoma metastases are poorly understood. Here, promoter methylation and gene expression of 11 heterogeneous patient-matched pairs of brain and extracranial metastases were analyzed using melanoma-specific gene regulatory networks learned from public transcriptome and methylome data followed by network-based impact propagation of patient-specific alterations. This innovative data analysis strategy allowed to predict potential impacts of patient-specific driver candidate genes on other genes and pathways. The patient-matched metastasis pairs clustered into three robust subgroups with specific downstream targets with known roles in cancer, including melanoma (SG1: RBM38, BCL11B, SG2: GATA3, FES, SG3: SLAMF6, PYCARD). Patient subgroups and ranking of target gene candidates were confirmed in a validation cohort. Summarizing, computational network-based impact analyses of heterogeneous metastasis pairs predicted individual regulatory differences in melanoma brain metastases, cumulating into three consistent subgroups with specific downstream target genes.
Collapse
Affiliation(s)
- Konrad Grützmann
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Theresa Kraft
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Matthias Meinhardt
- Department of Pathology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
| | - Friedegund Meier
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Dana Westphal
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Michael Seifert
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| |
Collapse
|
6
|
Peng D, Cahan P. OneSC: a computational platform for recapitulating cell state transitions. Bioinformatics 2024; 40:btae703. [PMID: 39570626 PMCID: PMC11630913 DOI: 10.1093/bioinformatics/btae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 11/13/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Computational modeling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology, and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA-sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico "synthetic" cells that faithfully mimic the temporal trajectories. RESULTS Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes, and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations. AVAILABILITY AND IMPLEMENTATION OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, United States
| |
Collapse
|
7
|
Tucci A, Flores-Vergara MA, Franks RG. Machine Learning Inference of Gene Regulatory Networks in Developing Mimulus Seeds. PLANTS (BASEL, SWITZERLAND) 2024; 13:3297. [PMID: 39683091 DOI: 10.3390/plants13233297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/07/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024]
Abstract
The angiosperm seed represents a critical evolutionary breakthrough that has been shown to propel the reproductive success and radiation of flowering plants. Seeds promote the rapid diversification of angiosperms by establishing postzygotic reproductive barriers, such as hybrid seed inviability. While prezygotic barriers to reproduction tend to be transient, postzygotic barriers are often permanent and therefore can play a pivotal role in facilitating speciation. This property of the angiosperm seed is exemplified in the Mimulus genus. In order to further the understanding of the gene regulatory mechanisms important in the Mimulus seed, we performed gene regulatory network (GRN) inference analysis by using time-series RNA-seq data from developing hybrid seeds from a viable cross between Mimulus guttatus and Mimulus pardalis. GRN inference has the capacity to identify active regulatory mechanisms in a sample and highlight genes of potential biological importance. In our case, GRN inference also provided the opportunity to uncover active regulatory relationships and generate a reference set of putative gene regulations. We deployed two GRN inference algorithms-RTP-STAR and KBoost-on three different subsets of our transcriptomic dataset. While the two algorithms yielded GRNs with different regulations and topologies when working with the same data subset, there was still significant overlap in the specific gene regulations they inferred, and they both identified potential novel regulatory mechanisms that warrant further investigation.
Collapse
Affiliation(s)
- Albert Tucci
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Miguel A Flores-Vergara
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Robert G Franks
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
8
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
9
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
10
|
Morin A, Chu C, Pavlidis P. Identifying Reproducible Transcription Regulator Coexpression Patterns with Single Cell Transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580581. [PMID: 38559016 PMCID: PMC10979919 DOI: 10.1101/2024.02.15.580581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes, rather than cell type compositional effects that emerge from bulk tissue samples. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR's coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Chingpan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
11
|
Schrod S, Lück N, Lohmayer R, Solbrig S, Völkl D, Wipfler T, Shutta KH, Ben Guebila M, Schäfer A, Beißbarth T, Zacharias HU, Oefner PJ, Quackenbush J, Altenbuchinger M. Spatial Cellular Networks from omics data with SpaCeNet. Genome Res 2024; 34:1371-1383. [PMID: 39231609 PMCID: PMC11529864 DOI: 10.1101/gr.279125.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/27/2024] [Indexed: 09/06/2024]
Abstract
Advances in omics technologies have allowed spatially resolved molecular profiling of single cells, providing a window not only into the diversity and distribution of cell types within a tissue, but also into the effects of interactions between cells in shaping the transcriptional landscape. Cells send chemical and mechanical signals which are received by other cells, where they can subsequently initiate context-specific gene regulatory responses. These interactions and their responses shape the individual molecular phenotype of a cell in a given microenvironment. RNAs or proteins measured in individual cells, together with the cells' spatial distribution, provide invaluable information about these mechanisms and the regulation of genes beyond processes occurring independently in each individual cell. "SpaCeNet" is a method designed to elucidate both the intracellular molecular networks (how molecular variables affect each other within the cell) and the intercellular molecular networks (how cells affect molecular variables in their neighbors). This is achieved by estimating conditional independence (CI) relations between captured variables within individual cells and by disentangling these from CI relations between variables of different cells.
Collapse
Affiliation(s)
- Stefan Schrod
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
| | - Niklas Lück
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
| | - Robert Lohmayer
- Leibniz Institute for Immunotherapy, 93053 Regensburg, Germany
| | - Stefan Solbrig
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Dennis Völkl
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Tina Wipfler
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Katherine H Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Andreas Schäfer
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Göttingen, 37077 Göttingen, Germany
| | - Helena U Zacharias
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, 30625 Hannover, Germany
| | - Peter J Oefner
- Institute of Functional Genomics, University of Regensburg, 93053 Regensburg, Germany
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Michael Altenbuchinger
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany;
| |
Collapse
|
12
|
Kernfeld E, Yang Y, Weinstock J, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Expression forecasting methods use machine learning models to predict how a cell will alter its transcriptome upon perturbation. Such methods are enticing because they promise to answer pressing questions in fields ranging from developmental genetics to cell fate engineering and because they are a fast, cheap, and accessible complement to the corresponding experiments. However, the absolute and relative accuracy of these methods is poorly characterized, limiting their informed use, their improvement, and the interpretation of their predictions. To address these issues, we created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to systematically assess methods, parameters, and sources of auxiliary data, finding that performance strongly depends on the choice of metric, and especially for simple metrics like mean squared error, it is uncommon for expression forecasting methods to out-perform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.
Collapse
|
13
|
Zhang J, Liu L, Wei X, Zhao C, Luo Y, Li J, Le TD. Scanning sample-specific miRNA regulation from bulk and single-cell RNA-sequencing data. BMC Biol 2024; 22:218. [PMID: 39334271 PMCID: PMC11438147 DOI: 10.1186/s12915-024-02020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 09/24/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND RNA-sequencing technology provides an effective tool for understanding miRNA regulation in complex human diseases, including cancers. A large number of computational methods have been developed to make use of bulk and single-cell RNA-sequencing data to identify miRNA regulations at the resolution of multiple samples (i.e. group of cells or tissues). However, due to the heterogeneity of individual samples, there is a strong need to infer miRNA regulation specific to individual samples to uncover miRNA regulation at the single-sample resolution level. RESULTS Here, we develop a framework, Scan, for scanning sample-specific miRNA regulation. Since a single network inference method or strategy cannot perform well for all types of new data, Scan incorporates 27 network inference methods and two strategies to infer tissue-specific or cell-specific miRNA regulation from bulk or single-cell RNA-sequencing data. Results on bulk and single-cell RNA-sequencing data demonstrate the effectiveness of Scan in inferring sample-specific miRNA regulation. Moreover, we have found that incorporating the prior information of miRNA targets can generally improve the accuracy of miRNA target prediction. In addition, Scan can contribute to construct cell/tissue correlation networks and recover aggregate miRNA regulatory networks. Finally, the comparison results have shown that the performance of network inference methods is likely to be data-specific, and selecting optimal network inference methods is required for more accurate prediction of miRNA targets. CONCLUSIONS Scan provides a useful method to help infer sample-specific miRNA regulation for new data, benchmark new network inference methods and deepen the understanding of miRNA regulation at the resolution of individual samples.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, 671003, Yunnan, China.
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Xuemei Wei
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Yanbi Luo
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
14
|
Bustad E, Petry E, Gu O, Griebel BT, Rustad TR, Sherman DR, Yang JH, Ma S. Predicting bacterial fitness in Mycobacterium tuberculosis with transcriptional regulatory network-informed interpretable machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614645. [PMID: 39386570 PMCID: PMC11463588 DOI: 10.1101/2024.09.23.614645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Mycobacterium tuberculosis (Mtb) is the causative agent of tuberculosis disease, the greatest source of global mortality by a bacterial pathogen. Mtb adapts and responds to diverse stresses such as antibiotics by inducing transcriptional stress-response regulatory programs. Understanding how and when these mycobacterial regulatory programs are activated could enable novel treatment strategies for potentiating the efficacy of new and existing drugs. Here we sought to define and analyze Mtb regulatory programs that modulate bacterial fitness. We assembled a large Mtb RNA expression compendium and applied these to infer a comprehensive Mtb transcriptional regulatory network and compute condition-specific transcription factor activity profiles. We utilized transcriptomic and functional genomics data to train an interpretable machine learning model that can predict Mtb fitness from transcription factor activity profiles. We demonstrated that this transcription factor activity-based model can successfully predict Mtb growth arrest and growth resumption under hypoxia and reaeration using only RNA-seq expression data as a starting point. These integrative network modeling and machine learning analyses thus enable the prediction of mycobacterial fitness under different environmental and genetic contexts. We envision these models can potentially inform the future design of prognostic assays and therapeutic intervention that can cripple Mtb growth and survival to cure tuberculosis disease.
Collapse
Affiliation(s)
- Ethan Bustad
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
| | - Edson Petry
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Oliver Gu
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Braden T. Griebel
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
- Department of Chemical Engineering, University of Washington, Seattle WA, USA
| | | | - David R. Sherman
- Department of Microbiology, University of Washington, Seattle WA, USA
| | - Jason H. Yang
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
- Department of Microbiology, Biochemistry, & Molecular Genetics, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Shuyi Ma
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
- Department of Chemical Engineering, University of Washington, Seattle WA, USA
- Department of Pediatrics, University of Washington, Seattle WA, USA
- Pathobiology Graduate Program, Department of Global Health, University of Washington, Seattle WA, USA
| |
Collapse
|
15
|
K Lodi M, Chernikov A, Ghosh P. COFFEE: consensus single cell-type specific inference for gene regulatory networks. Brief Bioinform 2024; 25:bbae457. [PMID: 39311699 PMCID: PMC11418232 DOI: 10.1093/bib/bbae457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/22/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, 1000 W Cary St, Richmond, VA 23284, United States
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, 1015 Floyd Ave, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, 401 W Main St, Richmond, VA 23284, United States
| |
Collapse
|
16
|
Ji R, Geng Y, Quan X. Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Sci Rep 2024; 14:21342. [PMID: 39266676 PMCID: PMC11393083 DOI: 10.1038/s41598-024-71864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 09/02/2024] [Indexed: 09/14/2024] Open
Abstract
Inferring gene regulatory networks through deep learning and causal inference methods is a crucial task in the field of computational biology and bioinformatics. This study presents a novel approach that uses a Graph Convolutional Network (GCN) guided by causal information to infer Gene Regulatory Networks (GRN). The transfer entropy and reconstruction layer are utilized to achieve causal feature reconstruction, mitigating the information loss problem caused by multiple rounds of neighbor aggregation in GCN, resulting in a causal and integrated representation of node features. Separable features are extracted from gene expression data by the Gaussian-kernel Autoencoder to improve computational efficiency. Experimental results on the DREAM5 and the mDC dataset demonstrate that our method exhibits superior performance compared to existing algorithms, as indicated by the higher values of the AUPRC metrics. Furthermore, the incorporation of causal feature reconstruction enhances the inferred GRN, rendering them more reasonable, accurate, and reliable.
Collapse
Affiliation(s)
- Ruirui Ji
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China.
- Key Laboratory of Shaanxi Province for Complex System Control and Intelligent Information Processing, Xi'an, 710048, Shaanxi, China.
| | - Yi Geng
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| | - Xin Quan
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
17
|
Choquette EM, Forthman KL, Kirlic N, Stewart JL, Cannon MJ, Akeman E, McMillan N, Mesker M, Tarrasch M, Kuplicki R, Paulus MP, Aupperle RL. Impulsivity, trauma history, and interoceptive awareness contribute to completion of a criminal diversion substance use treatment program for women. Front Psychol 2024; 15:1390199. [PMID: 39295754 PMCID: PMC11408307 DOI: 10.3389/fpsyg.2024.1390199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/19/2024] [Indexed: 09/21/2024] Open
Abstract
Introduction In the US, women are one of the fastest-growing segments of the prison population and more than a quarter of women in state prison are incarcerated for drug offenses. Substance use criminal diversion programs can be effective. It may be beneficial to identify individuals who are most likely to complete the program versus terminate early as this can provide information regarding who may need additional or unique programming to improve the likelihood of successful program completion. Prior research investigating prediction of success in these programs has primarily focused on demographic factors in male samples. Methods The current study used machine learning (ML) to examine other non-demographic factors related to the likelihood of completing a substance use criminal diversion program for women. A total of 179 women who were enrolled in a criminal diversion program consented and completed neuropsychological, self-report symptom measures, criminal history and demographic surveys at baseline. Model one entered 145 variables into a machine learning (ML) ensemble model, using repeated, nested cross-validation, predicting subsequent graduation versus termination from the program. An identical ML analysis was conducted for model two, in which 34 variables were entered, including the Women's Risk/Needs Assessment (WRNA). Results ML models were unable to predict graduation at an individual level better than chance (AUC = 0.59 [SE = 0.08] and 0.54 [SE = 0.13]). Post-hoc analyses indicated measures of impulsivity, trauma history, interoceptive awareness, employment/financial risk, housing safety, antisocial friends, anger/hostility, and WRNA total score and risk scores exhibited medium to large effect sizes in predicting treatment completion (p < 0.05; ds = 0.29 to 0.81). Discussion Results point towards the complexity involved in attempting to predict treatment completion at the individual level but also provide potential targets to inform future research aiming to reduce recidivism.
Collapse
Affiliation(s)
| | | | - Namik Kirlic
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | - Jennifer L Stewart
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | | | | | - Nick McMillan
- Women in Recovery, Family and Children's Services, Tulsa, OK, United States
| | - Micah Mesker
- Women in Recovery, Family and Children's Services, Tulsa, OK, United States
| | - Mimi Tarrasch
- Women in Recovery, Family and Children's Services, Tulsa, OK, United States
| | - Rayus Kuplicki
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Martin P Paulus
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | - Robin L Aupperle
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| |
Collapse
|
18
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
19
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
20
|
White BS, de Reyniès A, Newman AM, Waterfall JJ, Lamb A, Petitprez F, Lin Y, Yu R, Guerrero-Gimenez ME, Domanskyi S, Monaco G, Chung V, Banerjee J, Derrick D, Valdeolivas A, Li H, Xiao X, Wang S, Zheng F, Yang W, Catania CA, Lang BJ, Bertus TJ, Piermarocchi C, Caruso FP, Ceccarelli M, Yu T, Guo X, Bletz J, Coller J, Maecker H, Duault C, Shokoohi V, Patel S, Liliental JE, Simon S, Saez-Rodriguez J, Heiser LM, Guinney J, Gentles AJ. Community assessment of methods to deconvolve cellular composition from bulk gene expression. Nat Commun 2024; 15:7362. [PMID: 39191725 PMCID: PMC11350143 DOI: 10.1038/s41467-024-50618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 07/11/2024] [Indexed: 08/29/2024] Open
Abstract
We evaluate deconvolution methods, which infer levels of immune infiltration from bulk expression of tumor samples, through a community-wide DREAM Challenge. We assess six published and 22 community-contributed methods using in vitro and in silico transcriptional profiles of admixed cancer and healthy immune cells. Several published methods predict most cell types well, though they either were not trained to evaluate all functional CD8+ T cell states or do so with low accuracy. Several community-contributed methods address this gap, including a deep learning-based approach, whose strong performance establishes the applicability of this paradigm to deconvolution. Despite being developed largely using immune cells from healthy tissues, deconvolution methods predict levels of tumor-derived immune cells well. Our admixed and purified transcriptional profiles will be a valuable resource for developing deconvolution methods, including in response to common challenges we observe across methods, such as sensitive identification of functional CD4+ T cell states.
Collapse
Affiliation(s)
- Brian S White
- Sage Bionetworks, Seattle, WA, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Aurélien de Reyniès
- Centre de Recherche des Cordeliers, INSERM U1138, Université Paris Cité, Paris, France
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Joshua J Waterfall
- INSERM U830 and Translational Research Department, Institut Curie, PSL Research University, Paris, France
| | | | - Florent Petitprez
- Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
- MRC Centre for Reproductive Health, the Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| | - Yating Lin
- Xiamen University, Xiamen, Fujian, China
| | | | - Martin E Guerrero-Gimenez
- Institute of Biochemistry and Biotechnology, School of Medicine, National University of Cuyo, Mendoza, Argentina
| | | | - Gianni Monaco
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | | | | | - Daniel Derrick
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Alberto Valdeolivas
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Haojun Li
- Xiamen University, Xiamen, Fujian, China
| | - Xu Xiao
- Xiamen University, Xiamen, Fujian, China
| | - Shun Wang
- Department of Pathology, Cancer Hospital, Chinese Aacdemy of Medical Science, Beijing, China
| | | | | | - Carlos A Catania
- Laboratory of Intelligent Systems (LABSIN), Engineering School, National University of Cuyo, Mendoza, Argentina
| | - Benjamin J Lang
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | | | | | - Francesca P Caruso
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | - Michele Ceccarelli
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
- Sylvester Comprehensive Cancer Center, Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, Florida, USA
| | | | | | | | - John Coller
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Holden Maecker
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Caroline Duault
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Vida Shokoohi
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Shailja Patel
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Joanna E Liliental
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Laura M Heiser
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | | | - Andrew J Gentles
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
21
|
Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst 2024; 15:709-724.e13. [PMID: 39173585 PMCID: PMC11642480 DOI: 10.1016/j.cels.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/31/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Kernfeld
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Genetic Medicine, Johns Hopkins Medicine, Baltimore, MD, USA; Malone Center for Engineering and Healthcare, Johns Hopkins University, Baltimore, MD, USA; Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
22
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
23
|
Priego Espinosa D, Espinal-Enríquez J, Aldana A, Aldana M, Martínez-Mekler G, Carneiro J, Darszon A. Reviewing mathematical models of sperm signaling networks. Mol Reprod Dev 2024; 91:e23766. [PMID: 39175359 DOI: 10.1002/mrd.23766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Dave Garbers' work significantly contributed to our understanding of sperm's regulated motility, capacitation, and the acrosome reaction. These key sperm functions involve complex multistep signaling pathways engaging numerous finely orchestrated elements. Despite significant progress, many parameters and interactions among these elements remain elusive. Mathematical modeling emerges as a potent tool to study sperm physiology, providing a framework to integrate experimental results and capture functional dynamics considering biochemical, biophysical, and cellular elements. Depending on research objectives, different modeling strategies, broadly categorized into continuous and discrete approaches, reveal valuable insights into cell function. These models allow the exploration of hypotheses regarding molecules, conditions, and pathways, whenever they become challenging to evaluate experimentally. This review presents an overview of current theoretical and experimental efforts to understand sperm motility regulation, capacitation, and the acrosome reaction. We discuss the strengths and weaknesses of different modeling strategies and highlight key findings and unresolved questions. Notable discoveries include the importance of specific ion channels, the role of intracellular molecular heterogeneity in capacitation and the acrosome reaction, and the impact of pH changes on acrosomal exocytosis. Ultimately, this review underscores the crucial importance of mathematical frameworks in advancing our understanding of sperm physiology and guiding future experimental investigations.
Collapse
Affiliation(s)
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
| | - Andrés Aldana
- Network Science Institute, Northeastern University, Boston, Massachusetts, USA
| | - Maximino Aldana
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, México
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Gustavo Martínez-Mekler
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, México
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Jorge Carneiro
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Alberto Darszon
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| |
Collapse
|
24
|
Yi X, Liu S, Wu Y, McCloskey D, Meng Z. BPP: a platform for automatic biochemical pathway prediction. Brief Bioinform 2024; 25:bbae355. [PMID: 39082653 PMCID: PMC11289738 DOI: 10.1093/bib/bbae355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/16/2024] [Accepted: 07/09/2024] [Indexed: 08/03/2024] Open
Abstract
A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.
Collapse
Affiliation(s)
- Xinhao Yi
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| | - Siwei Liu
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Building 1B, Masdar City, Abu Dhabi 000000, United Arab Emirates
| | - Yu Wu
- School of Mathematical Sciences, Fudan University, 220 Handan Rd, Yangpu District, Shanghai 200438, China
| | - Douglas McCloskey
- Artificial Intelligence, BioMed X Institute, Im Neuenheimer Feld 515, Heidelberg 69120, Germany
| | - Zaiqiao Meng
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| |
Collapse
|
25
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
26
|
Tian H, Tang L, Yang Z, Xiang Y, Min Q, Yin M, You H, Xiao Z, Shen J. Current understanding of functional peptides encoded by lncRNA in cancer. Cancer Cell Int 2024; 24:252. [PMID: 39030557 PMCID: PMC11265036 DOI: 10.1186/s12935-024-03446-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 07/09/2024] [Indexed: 07/21/2024] Open
Abstract
Dysregulated gene expression and imbalance of transcriptional regulation are typical features of cancer. RNA always plays a key role in these processes. Human transcripts contain many RNAs without long open reading frames (ORF, > 100 aa) and that are more than 200 bp in length. They are usually regarded as long non-coding RNA (lncRNA) which play an important role in cancer regulation, including chromatin remodeling, transcriptional regulation, translational regulation and as miRNA sponges. With the advancement of ribosome profiling and sequencing technologies, increasing research evidence revealed that some ORFs in lncRNA can also encode peptides and participate in the regulation of multiple organ tumors, which undoubtedly opens a new chapter in the field of lncRNA and oncology research. In this review, we discuss the biological function of lncRNA in tumors, the current methods to evaluate their coding potential and the role of functional small peptides encoded by lncRNA in cancers. Investigating the small peptides encoded by lncRNA and understanding the regulatory mechanisms of these functional peptides may contribute to a deeper understanding of cancer and the development of new targeted anticancer therapies.
Collapse
Affiliation(s)
- Hua Tian
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
- School of Nursing, Chongqing College of Humanities, Science & Technology, Chongqing, China
| | - Lu Tang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zihan Yang
- Department of Pathology, The Affiliated Hospital of Southwest Medical University, Luzhou, China, 646000
| | - Yanxi Xiang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Qi Min
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Mengshuang Yin
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Huili You
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zhangang Xiao
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
- Gulin Traditional Chinese Medicine Hospital, Luzhou, China.
- Department of Pharmacology, School of Pharmacy, Sichuan College of Traditional Chinese Medicine, Mianyang, China.
| | - Jing Shen
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
| |
Collapse
|
27
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
28
|
Yu J, Leng J, Yuan F, Sun D, Wu LY. Reverse network diffusion to remove indirect noise for better inference of gene regulatory networks. Bioinformatics 2024; 40:btae435. [PMID: 38963312 PMCID: PMC11236096 DOI: 10.1093/bioinformatics/btae435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 06/24/2024] [Accepted: 07/03/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference. RESULTS In this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks. AVAILABILITY AND IMPLEMENTATION The R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce.
Collapse
Affiliation(s)
- Jiating Yu
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Fan Yuan
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
29
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
30
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
31
|
Ahsen ME, Vogel R, Stolovitzky G. Optimal linear ensemble of binary classifiers. BIOINFORMATICS ADVANCES 2024; 4:vbae093. [PMID: 39011276 PMCID: PMC11249386 DOI: 10.1093/bioadv/vbae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/03/2024] [Accepted: 06/13/2024] [Indexed: 07/17/2024]
Abstract
Motivation The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation GitHub repository, https://github.com/robert-vogel/moca.
Collapse
Affiliation(s)
- Mehmet Eren Ahsen
- Department of Business Administration, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, United States
- Department of Biomedical and Translational Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, United States
| | - Robert Vogel
- Thomas J. Watson Research Center, IBM, New York, NY 10598, United States
- Department of Integrated Structural and Computational Biology, Scripps Research, La Jolla, CA 92037, United States
| | | |
Collapse
|
32
|
Nouri N, Gaglia G, Mattoo H, de Rinaldis E, Savova V. GENIX enables comparative network analysis of single-cell RNA sequencing to reveal signatures of therapeutic interventions. CELL REPORTS METHODS 2024; 4:100794. [PMID: 38861988 PMCID: PMC11228368 DOI: 10.1016/j.crmeth.2024.100794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular responses to perturbations such as therapeutic interventions and vaccines. Gene relevance to such perturbations is often assessed through differential expression analysis (DEA), which offers a one-dimensional view of the transcriptomic landscape. This method potentially overlooks genes with modest expression changes but profound downstream effects and is susceptible to false positives. We present GENIX (gene expression network importance examination), a computational framework that transcends DEA by constructing gene association networks and employing a network-based comparative model to identify topological signature genes. We benchmark GENIX using both synthetic and experimental datasets, including analysis of influenza vaccine-induced immune responses in peripheral blood mononuclear cells (PBMCs) from recovered COVID-19 patients. GENIX successfully emulates key characteristics of biological networks and reveals signature genes that are missed by classical DEA, thereby broadening the scope of target gene discovery in precision medicine.
Collapse
Affiliation(s)
- Nima Nouri
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA.
| | - Giorgio Gaglia
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Hamid Mattoo
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Emanuele de Rinaldis
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Virginia Savova
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA.
| |
Collapse
|
33
|
Liu J, Xiang T, Song XC, Zhang S, Wu Q, Gao J, Lv M, Shi C, Yang X, Liu Y, Fu J, Shi W, Fang M, Qu G, Yu H, Jiang G. High-Efficiency Effect-Directed Analysis Leveraging Five High Level Advancements: A Critical Review. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:9925-9944. [PMID: 38820315 DOI: 10.1021/acs.est.3c10996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
Organic contaminants are ubiquitous in the environment, with mounting evidence unequivocally connecting them to aquatic toxicity, illness, and increased mortality, underscoring their substantial impacts on ecological security and environmental health. The intricate composition of sample mixtures and uncertain physicochemical features of potential toxic substances pose challenges to identify key toxicants in environmental samples. Effect-directed analysis (EDA), establishing a connection between key toxicants found in environmental samples and associated hazards, enables the identification of toxicants that can streamline research efforts and inform management action. Nevertheless, the advancement of EDA is constrained by the following factors: inadequate extraction and fractionation of environmental samples, limited bioassay endpoints and unknown linkage to higher order impacts, limited coverage of chemical analysis (i.e., high-resolution mass spectrometry, HRMS), and lacking effective linkage between bioassays and chemical analysis. This review proposes five key advancements to enhance the efficiency of EDA in addressing these challenges: (1) multiple adsorbents for comprehensive coverage of chemical extraction, (2) high-resolution microfractionation and multidimensional fractionation for refined fractionation, (3) robust in vivo/vitro bioassays and omics, (4) high-performance configurations for HRMS analysis, and (5) chemical-, data-, and knowledge-driven approaches for streamlined toxicant identification and validation. We envision that future EDA will integrate big data and artificial intelligence based on the development of quantitative omics, cutting-edge multidimensional microfractionation, and ultraperformance MS to identify environmental hazard factors, serving for broader environmental governance.
Collapse
Affiliation(s)
- Jifu Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tongtong Xiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- College of Sciences, Northeastern University, Shenyang 110004, China
| | - Xue-Chao Song
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shaoqing Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Qi Wu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jie Gao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meilin Lv
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- College of Sciences, Northeastern University, Shenyang 110004, China
| | - Chunzhen Shi
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Xiaoxi Yang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yanna Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Shi
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Mingliang Fang
- Department of Environmental Science and Engineering, Fudan University, Shanghai 200433, China
| | - Guangbo Qu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Institute of Environment and Health, Jianghan University, Wuhan, Hubei 430056, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hongxia Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- College of Sciences, Northeastern University, Shenyang 110004, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
34
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
35
|
Häusler S. Correlations reveal the hierarchical organization of biological networks with latent variables. Commun Biol 2024; 7:678. [PMID: 38831002 PMCID: PMC11148204 DOI: 10.1038/s42003-024-06342-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 05/16/2024] [Indexed: 06/05/2024] Open
Abstract
Deciphering the functional organization of large biological networks is a major challenge for current mathematical methods. A common approach is to decompose networks into largely independent functional modules, but inferring these modules and their organization from network activity is difficult, given the uncertainties and incompleteness of measurements. Typically, some parts of the overall functional organization, such as intermediate processing steps, are latent. We show that the hidden structure can be determined from the statistical moments of observable network components alone, as long as the functional relevance of the network components lies in their mean values and the mean of each latent variable maps onto a scaled expectation of a binary variable. Whether the function of biological networks permits a hierarchical modularization can be falsified by a correlation-based statistical test that we derive. We apply the test to gene regulatory networks, dendrites of pyramidal neurons, and networks of spiking neurons.
Collapse
Affiliation(s)
- Stefan Häusler
- Faculty of Biology and Bernstein Center for Computational Neuroscience, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
36
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| |
Collapse
|
37
|
Wu S, Jin K, Tang M, Xia Y, Gao W. Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. Interdiscip Sci 2024; 16:318-332. [PMID: 38342857 DOI: 10.1007/s12539-024-00604-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/26/2023] [Accepted: 01/03/2024] [Indexed: 02/13/2024]
Abstract
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
Collapse
Affiliation(s)
- Songyang Wu
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Kui Jin
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Mingjing Tang
- School of Life Science, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming, 650500, China.
| | - Yuelong Xia
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Wei Gao
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| |
Collapse
|
38
|
Magni S, Sawlekar R, Capelle CM, Tslaf V, Baron A, Zeng N, Mombaerts L, Yue Z, Yuan Y, Hefeng FQ, Gonçalves J. Inferring upstream regulatory genes of FOXP3 in human regulatory T cells from time-series transcriptomic data. NPJ Syst Biol Appl 2024; 10:59. [PMID: 38811598 PMCID: PMC11137136 DOI: 10.1038/s41540-024-00387-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/10/2024] [Indexed: 05/31/2024] Open
Abstract
The discovery of upstream regulatory genes of a gene of interest still remains challenging. Here we applied a scalable computational method to unbiasedly predict candidate regulatory genes of critical transcription factors by searching the whole genome. We illustrated our approach with a case study on the master regulator FOXP3 of human primary regulatory T cells (Tregs). While target genes of FOXP3 have been identified, its upstream regulatory machinery still remains elusive. Our methodology selected five top-ranked candidates that were tested via proof-of-concept experiments. Following knockdown, three out of five candidates showed significant effects on the mRNA expression of FOXP3 across multiple donors. This provides insights into the regulatory mechanisms modulating FOXP3 transcriptional expression in Tregs. Overall, at the genome level this represents a high level of accuracy in predicting upstream regulatory genes of key genes of interest.
Collapse
Affiliation(s)
- Stefano Magni
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Rucha Sawlekar
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
- Robotics and Artificial Intelligence, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå, Sweden
| | - Christophe M Capelle
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
- Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Vera Tslaf
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
- Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Transversal Translational Medicine, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Alexandre Baron
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
| | - Ni Zeng
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
| | - Laurent Mombaerts
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Zuogong Yue
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ye Yuan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Feng Q Hefeng
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg.
| | - Jorge Gonçalves
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
39
|
Li Q, Button-Simons KA, Sievert MAC, Chahoud E, Foster GF, Meis K, Ferdig MT, Milenković T. Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum. Genes (Basel) 2024; 15:685. [PMID: 38927622 PMCID: PMC11202799 DOI: 10.3390/genes15060685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Malaria results in more than 550,000 deaths each year due to drug resistance in the most lethal Plasmodium (P.) species P. falciparum. A full P. falciparum genome was published in 2002, yet 44.6% of its genes have unknown functions. Improving the functional annotation of genes is important for identifying drug targets and understanding the evolution of drug resistance. RESULTS Genes function by interacting with one another. So, analyzing gene co-expression networks can enhance functional annotations and prioritize genes for wet lab validation. Earlier efforts to build gene co-expression networks in P. falciparum have been limited to a single network inference method or gaining biological understanding for only a single gene and its interacting partners. Here, we explore multiple inference methods and aim to systematically predict functional annotations for all P. falciparum genes. We evaluate each inferred network based on how well it predicts existing gene-Gene Ontology (GO) term annotations using network clustering and leave-one-out crossvalidation. We assess overlaps of the different networks' edges (gene co-expression relationships), as well as predicted functional knowledge. The networks' edges are overall complementary: 47-85% of all edges are unique to each network. In terms of the accuracy of predicting gene functional annotations, all networks yielded relatively high precision (as high as 87% for the network inferred using mutual information), but the highest recall reached was below 15%. All networks having low recall means that none of them capture a large amount of all existing gene-GO term annotations. In fact, their annotation predictions are highly complementary, with the largest pairwise overlap of only 27%. We provide ranked lists of inferred gene-gene interactions and predicted gene-GO term annotations for future use and wet lab validation by the malaria community. CONCLUSIONS The different networks seem to capture different aspects of the P. falciparum biology in terms of both inferred interactions and predicted gene functional annotations. Thus, relying on a single network inference method should be avoided when possible. SUPPLEMENTARY DATA Attached.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
- Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN 46556, USA (M.T.F.)
| | - Katrina A. Button-Simons
- Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN 46556, USA (M.T.F.)
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Mackenzie A. C. Sievert
- Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN 46556, USA (M.T.F.)
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Elias Chahoud
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Department of Preprofessional Studies, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Gabriel F. Foster
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Kaitlynn Meis
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael T. Ferdig
- Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN 46556, USA (M.T.F.)
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
- Lucy Family Institute for Data & Society, University of Notre Dame, Notre Dame, IN 46556, USA (M.T.F.)
| |
Collapse
|
40
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
41
|
Wei PJ, Guo Z, Gao Z, Ding Z, Cao RF, Su Y, Zheng CH. Inference of gene regulatory networks based on directed graph convolutional networks. Brief Bioinform 2024; 25:bbae309. [PMID: 38935070 PMCID: PMC11209731 DOI: 10.1093/bib/bbae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Ziqiang Guo
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zheng Ding
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Chun-Hou Zheng
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| |
Collapse
|
42
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 PMCID: PMC11525796 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
43
|
Chen W, Miao C, Zhang Z, Fung CSH, Wang R, Chen Y, Qian Y, Cheng L, Yip KY, Tsui SKW, Cao Q. Commonly used software tools produce conflicting and overly-optimistic AUPRC values. Genome Biol 2024; 25:118. [PMID: 38741205 PMCID: PMC11089773 DOI: 10.1186/s13059-024-03266-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 04/30/2024] [Indexed: 05/16/2024] Open
Abstract
The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotation. We evaluate 10 popular tools for plotting PRC and computing AUPRC, which were collectively used in more than 3000 published studies. We find the AUPRC values computed by the tools rank classifiers differently and some tools produce overly-optimistic results.
Collapse
Affiliation(s)
- Wenyu Chen
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Chen Miao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Zhenghao Zhang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Cathy Sin-Hang Fung
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Ran Wang
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yizhen Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yan Qian
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, China
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA.
| | - Stephen Kwok-Wing Tsui
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
| | - Qin Cao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China.
| |
Collapse
|
44
|
Shen B, Coruzzi GM, Shasha D. Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications. Front Genet 2024; 15:1371607. [PMID: 38798697 PMCID: PMC11120958 DOI: 10.3389/fgene.2024.1371607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/17/2024] [Indexed: 05/29/2024] Open
Abstract
A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.
Collapse
Affiliation(s)
- Bingran Shen
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, United States
| | - Gloria M. Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, United States
| | - Dennis Shasha
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, United States
| |
Collapse
|
45
|
Kion-Crosby W, Barquist L. Network depth affects inference of gene sets from bacterial transcriptomes using denoising autoencoders. BIOINFORMATICS ADVANCES 2024; 4:vbae066. [PMID: 39027639 PMCID: PMC11256956 DOI: 10.1093/bioadv/vbae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/05/2024] [Accepted: 05/02/2024] [Indexed: 07/20/2024]
Abstract
Summary The increasing number of publicly available bacterial gene expression data sets provides an unprecedented resource for the study of gene regulation in diverse conditions, but emphasizes the need for self-supervised methods for the automated generation of new hypotheses. One approach for inferring coordinated regulation from bacterial expression data is through neural networks known as denoising autoencoders (DAEs) which encode large datasets in a reduced bottleneck layer. We have generalized this application of DAEs to include deep networks and explore the effects of network architecture on gene set inference using deep learning. We developed a DAE-based pipeline to extract gene sets from transcriptomic data in Escherichia coli, validate our method by comparing inferred gene sets with known pathways, and have used this pipeline to explore how the choice of network architecture impacts gene set recovery. We find that increasing network depth leads the DAEs to explain gene expression in terms of fewer, more concisely defined gene sets, and that adjusting the width results in a tradeoff between generalizability and biological inference. Finally, leveraging our understanding of the impact of DAE architecture, we apply our pipeline to an independent uropathogenic E.coli dataset to identify genes uniquely induced during human colonization. Availability and implementation https://github.com/BarquistLab/DAE_architecture_exploration.
Collapse
Affiliation(s)
- Willow Kion-Crosby
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
| | - Lars Barquist
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
- Department of Biology, University of Toronto, Mississauga, ON L5L 1C6, Canada
| |
Collapse
|
46
|
Guo C, Huang Z, Chen J, Yu G, Wang Y, Wang X. Identification of Novel Regulators of Leaf Senescence Using a Deep Learning Model. PLANTS (BASEL, SWITZERLAND) 2024; 13:1276. [PMID: 38732491 PMCID: PMC11085074 DOI: 10.3390/plants13091276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Deep learning has emerged as a powerful tool for investigating intricate biological processes in plants by harnessing the potential of large-scale data. Gene regulation is a complex process that transcription factors (TFs), cooperating with their target genes, participate in through various aspects of biological processes. Despite its significance, the study of gene regulation has primarily focused on a limited number of notable instances, leaving numerous aspects and interactions yet to be explored comprehensively. Here, we developed DEGRN (Deep learning on Expression for Gene Regulatory Network), an innovative deep learning model designed to decipher gene interactions by leveraging high-dimensional expression data obtained from bulk RNA-Seq and scRNA-Seq data in the model plant Arabidopsis. DEGRN exhibited a compared level of predictive power when applied to various datasets. Through the utilization of DEGRN, we successfully identified an extensive set of 3,053,363 high-quality interactions, encompassing 1430 TFs and 13,739 non-TF genes. Notably, DEGRN's predictive capabilities allowed us to uncover novel regulators involved in a range of complex biological processes, including development, metabolism, and stress responses. Using leaf senescence as an example, we revealed a complex network underpinning this process composed of diverse TF families, including bHLH, ERF, and MYB. We also identified a novel TF, named MAF5, whose expression showed a strong linear regression relation during the progression of senescence. The mutant maf5 showed early leaf decay compared to the wild type, indicating a potential role in the regulation of leaf senescence. This hypothesis was further supported by the expression patterns observed across four stages of leaf development, as well as transcriptomics analysis. Overall, the comprehensive coverage provided by DEGRN expands our understanding of gene regulatory networks and paves the way for further investigations into their functional implications.
Collapse
Affiliation(s)
| | | | | | | | | | - Xu Wang
- Shanghai Collaborative Innovation Center of Agri-Seeds, Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (C.G.); (Z.H.); (J.C.); (G.Y.); (Y.W.)
| |
Collapse
|
47
|
Ranjan R, Srijan S, Balekuttira S, Agarwal T, Ramey M, Dobbins M, Kuhn R, Wang X, Hudson K, Li Y, Varala K. Organ-delimited gene regulatory networks provide high accuracy in candidate transcription factor selection across diverse processes. Proc Natl Acad Sci U S A 2024; 121:e2322751121. [PMID: 38652750 PMCID: PMC11066984 DOI: 10.1073/pnas.2322751121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/14/2024] [Indexed: 04/25/2024] Open
Abstract
Organ-specific gene expression datasets that include hundreds to thousands of experiments allow the reconstruction of organ-level gene regulatory networks (GRNs). However, creating such datasets is greatly hampered by the requirements of extensive and tedious manual curation. Here, we trained a supervised classification model that can accurately classify the organ-of-origin for a plant transcriptome. This K-Nearest Neighbor-based multiclass classifier was used to create organ-specific gene expression datasets for the leaf, root, shoot, flower, and seed in Arabidopsis thaliana. A GRN inference approach was used to determine the: i. influential transcription factors (TFs) in each organ and, ii. most influential TFs for specific biological processes in that organ. These genome-wide, organ-delimited GRNs (OD-GRNs), recalled many known regulators of organ development and processes operating in those organs. Importantly, many previously unknown TF regulators were uncovered as potential regulators of these processes. As a proof-of-concept, we focused on experimentally validating the predicted TF regulators of lipid biosynthesis in seeds, an important food and biofuel trait. Of the top 20 predicted TFs, eight are known regulators of seed oil content, e.g., WRI1, LEC1, FUS3. Importantly, we validated our prediction of MybS2, TGA4, SPL12, AGL18, and DiV2 as regulators of seed lipid biosynthesis. We elucidated the molecular mechanism of MybS2 and show that it induces purple acid phosphatase family genes and lipid synthesis genes to enhance seed lipid content. This general approach has the potential to be extended to any species with sufficiently large gene expression datasets to find unique regulators of any trait-of-interest.
Collapse
Affiliation(s)
- Rajeev Ranjan
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
- Center for Plant Biology, Purdue University, West Lafayette, IN47907
| | - Sonali Srijan
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
| | - Somaiah Balekuttira
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
| | - Tina Agarwal
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
- Center for Plant Biology, Purdue University, West Lafayette, IN47907
| | - Melissa Ramey
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
| | - Madison Dobbins
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
| | - Rachel Kuhn
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
| | - Xiaojin Wang
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
- Center for Plant Biology, Purdue University, West Lafayette, IN47907
| | - Karen Hudson
- United States Department of Agriculture-Agricultural Research Service Crop Production and Pest Control Research Unit, West Lafayette, IN47907
| | - Ying Li
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
- Center for Plant Biology, Purdue University, West Lafayette, IN47907
| | - Kranthi Varala
- Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN47907
- Center for Plant Biology, Purdue University, West Lafayette, IN47907
| |
Collapse
|
48
|
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024; 27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.
Collapse
Affiliation(s)
- Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
49
|
Forabosco P, Pala M, Crobu F, Diana MA, Marongiu M, Cusano R, Angius A, Steri M, Orrù V, Schlessinger D, Fiorillo E, Devoto M, Cucca F. Transcriptome organization of white blood cells through gene co-expression network analysis in a large RNA-seq dataset. Front Immunol 2024; 15:1350111. [PMID: 38629067 PMCID: PMC11018966 DOI: 10.3389/fimmu.2024.1350111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/13/2024] [Indexed: 04/19/2024] Open
Abstract
Gene co-expression network analysis enables identification of biologically meaningful clusters of co-regulated genes (modules) in an unsupervised manner. We present here the largest study conducted thus far of co-expression networks in white blood cells (WBC) based on RNA-seq data from 624 individuals. We identify 41 modules, 13 of them related to specific immune-related functions and cell types (e.g. neutrophils, B and T cells, NK cells, and plasmacytoid dendritic cells); we highlight biologically relevant lncRNAs for each annotated module of co-expressed genes. We further characterize with unprecedented resolution the modules in T cell sub-types, through the availability of 95 immune phenotypes obtained by flow cytometry in the same individuals. This study provides novel insights into the transcriptional architecture of human leukocytes, showing how network analysis can advance our understanding of coding and non-coding gene interactions in immune system cells.
Collapse
Affiliation(s)
- Paola Forabosco
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Mauro Pala
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Francesca Crobu
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Maria Antonietta Diana
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Mara Marongiu
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Roberto Cusano
- CRS4-Next Generation Sequencing (NGS) Core, Parco POLARIS, Cagliari, Italy
| | - Andrea Angius
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Maristella Steri
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Valeria Orrù
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - David Schlessinger
- Laboratory of Genetics and Genomics, National Institute on Aging, National Institutes of Health (NIH), Baltimore, MA, United States
| | - Edoardo Fiorillo
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Marcella Devoto
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
- Dipartimento di Medicina Traslazionale e di Precisione, Università Sapienza, Roma, Italy
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
- Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| |
Collapse
|
50
|
Gao Z, Su Y, Xia J, Cao RF, Ding Y, Zheng CH, Wei PJ. DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding. Brief Bioinform 2024; 25:bbae143. [PMID: 38581416 PMCID: PMC10998536 DOI: 10.1093/bib/bbae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.
Collapse
Affiliation(s)
- Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Rui-Fen Cao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yun Ding
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|