1
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
2
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high- resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
|
3
|
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024; 27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.
Collapse
Affiliation(s)
- Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
4
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. NPJ Syst Biol Appl 2024; 10:35. [PMID: 38565850 PMCID: PMC10987498 DOI: 10.1038/s41540-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Gene regulatory mechanisms (GRMs) control the formation of spatial and temporal expression patterns that can serve as regulatory signals for the development of complex shapes. Synthetic developmental biology aims to engineer such genetic circuits for understanding and producing desired multicellular spatial patterns. However, designing synthetic GRMs for complex, multi-dimensional spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given two-dimensional spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target spatial pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover complex genetic circuits producing spatial patterns.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, Baltimore, Baltimore, MD, USA.
| |
Collapse
|
5
|
Liu D, Liu Z, Liao H, Chen ZS, Qin B. Ferroptosis as a potential therapeutic target for age-related macular degeneration. Drug Discov Today 2024; 29:103920. [PMID: 38369100 DOI: 10.1016/j.drudis.2024.103920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/11/2024] [Accepted: 02/14/2024] [Indexed: 02/20/2024]
Abstract
Cell death plays a crucial part in the process of age-related macular degeneration (AMD), but its mechanisms remain elusive. Accumulating evidence suggests that ferroptosis, a novel form of regulatory cell death characterized by iron-dependent accumulation of lipid hydroperoxides, has a crucial role in the pathogenesis of AMD. Numerous studies have suggested that ferroptosis participates in the degradation of retinal cells and accelerates the progression of AMD. Furthermore, inhibitors of ferroptosis exhibit notable protective effects in AMD, underscoring the significance of ferroptosis as a pivotal mechanism in the death of retinal cells during the process of AMD. This review aims to summarize the molecular mechanisms of ferroptosis in AMD, enumerate potential inhibitors and discuss the challenges and future opportunities associated with targeting ferroptosis as a therapeutic strategy, providing important information references and insights for the prevention and treatment of AMD.
Collapse
Affiliation(s)
- Dongcheng Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Ziling Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Hongxia Liao
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Zhe-Sheng Chen
- College of Pharmacy and Health Sciences, St. John's University, Queens, New York, USA.
| | - Bo Qin
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China; Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China; Aier Eye Hospital, Tianjin University, Tianjin, China.
| |
Collapse
|
6
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578316. [PMID: 38562775 PMCID: PMC10983863 DOI: 10.1101/2024.02.01.578316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement. Key points An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
Collapse
|
7
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. [Explainable artificial intelligence in pathology]. PATHOLOGIE (HEIDELBERG, GERMANY) 2024; 45:133-139. [PMID: 38315198 DOI: 10.1007/s00292-024-01308-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 02/07/2024]
Abstract
With the advancements in precision medicine, the demands on pathological diagnostics have increased, requiring standardized, quantitative, and integrated assessments of histomorphological and molecular pathological data. Great hopes are placed in artificial intelligence (AI) methods, which have demonstrated the ability to analyze complex clinical, histological, and molecular data for disease classification, biomarker quantification, and prognosis estimation. This paper provides an overview of the latest developments in pathology AI, discusses the limitations, particularly concerning the black box character of AI, and describes solutions to make decision processes more transparent using methods of so-called explainable AI (XAI).
Collapse
Affiliation(s)
- Frederick Klauschen
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland.
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland.
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland.
| | - Jonas Dippel
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland
| | - Philipp Keyl
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
| | - Philipp Jurmeister
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland
| | - Michael Bockmayr
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland
- Pädiatrische Hämatologie und Onkologie, Universitätsklinikum Hamburg-Eppendorf, Hamburg, Deutschland
- Forschungsinstitut Kinderkrebs-Zentrum Hamburg, Hamburg, Deutschland
| | - Andreas Mock
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
- Deutsches Krebsforschungszentrum (DKTK/DKFZ), Partnerstandort München, München, Deutschland
| | - Oliver Buchstab
- Pathologisches Institut, Ludwig-Maximilians-Universität München, Thalkirchner Str. 36, 80337, München, Deutschland
| | - Maximilian Alber
- Institut für Pathologie, Charité - Universitätsmedizin Berlin, Berlin, Deutschland
- Aignostics GmbH, Berlin, Deutschland
| | | | - Grégoire Montavon
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland
- Fachbereich Mathematik und Informatik, Freie Universität Berlin, Berlin, Deutschland
| | - Klaus-Robert Müller
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Deutschland.
- Machine Learning Group, Fachbereich Elektrotechnik und Informatik, Technische Universität Berlin, Berlin, Deutschland.
- Department of Artificial Intelligence, Korea University, Seoul, Südkorea.
- Max-Planck-Institut für Informatik, Saarbrücken, Deutschland.
- Machine Learning/Intelligent Data Analysis (IDA), Technische Universität Berlin, Marchstr. 23, 10587, Berlin, Deutschland.
| |
Collapse
|
8
|
Deschildre J, Vandemoortele B, Loers JU, De Preter K, Vermeirssen V. Evaluation of single-sample network inference methods for precision oncology. NPJ Syst Biol Appl 2024; 10:18. [PMID: 38360881 PMCID: PMC10869342 DOI: 10.1038/s41540-024-00340-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/17/2024] [Indexed: 02/17/2024] Open
Abstract
A major challenge in precision oncology is to detect targetable cancer vulnerabilities in individual patients. Modeling high-throughput omics data in biological networks allows identifying key molecules and processes of tumorigenesis. Traditionally, network inference methods rely on many samples to contain sufficient information for learning, resulting in aggregate networks. However, to implement patient-tailored approaches in precision oncology, we need to interpret omics data at the level of individual patients. Several single-sample network inference methods have been developed that infer biological networks for an individual sample from bulk RNA-seq data. However, only a limited comparison of these methods has been made and many methods rely on 'normal tissue' samples as reference, which are not always available. Here, we conducted an evaluation of the single-sample network inference methods SSN, LIONESS, SWEET, iENA, CSN and SSPGI using transcriptomic profiles of lung and brain cancer cell lines from the CCLE database. The methods constructed functional gene networks with distinct network characteristics. Hub gene analyses revealed different degrees of subtype-specificity across methods. Single-sample networks were able to distinguish between tumor subtypes, as exemplified by node strength clustering, enrichment of known subtype-specific driver genes among hubs and differential node strength. We also showed that single-sample networks correlated better to other omics data from the same cell line as compared to aggregate networks. We conclude that single-sample network inference methods can reflect sample-specific biology when 'normal tissue' samples are absent and we point out peculiarities of each method.
Collapse
Affiliation(s)
- Joke Deschildre
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Boris Vandemoortele
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Katleen De Preter
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Lab of Translational Onco-genomics and Bio-informatics, Center for Medical Biotechnology (VIB-UGent), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium.
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
9
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
10
|
Joshi SK, Piehowski P, Liu T, Gosline SJC, McDermott JE, Druker BJ, Traer E, Tyner JW, Agarwal A, Tognon CE, Rodland KD. Mass Spectrometry-Based Proteogenomics: New Therapeutic Opportunities for Precision Medicine. Annu Rev Pharmacol Toxicol 2024; 64:455-479. [PMID: 37738504 PMCID: PMC10950354 DOI: 10.1146/annurev-pharmtox-022723-113921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Proteogenomics refers to the integration of comprehensive genomic, transcriptomic, and proteomic measurements from the same samples with the goal of fully understanding the regulatory processes converting genotypes to phenotypes, often with an emphasis on gaining a deeper understanding of disease processes. Although specific genetic mutations have long been known to drive the development of multiple cancers, gene mutations alone do not always predict prognosis or response to targeted therapy. The benefit of proteogenomics research is that information obtained from proteins and their corresponding pathways provides insight into therapeutic targets that can complement genomic information by providing an additional dimension regarding the underlying mechanisms and pathophysiology of tumors. This review describes the novel insights into tumor biology and drug resistance derived from proteogenomic analysis while highlighting the clinical potential of proteogenomic observations and advances in technique and analysis tools.
Collapse
Affiliation(s)
- Sunil K Joshi
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Paul Piehowski
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tao Liu
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Sara J C Gosline
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jason E McDermott
- Pacific Northwest National Laboratory, Richland, Washington, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Brian J Druker
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Elie Traer
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Jeffrey W Tyner
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Anupriya Agarwal
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Cristina E Tognon
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Karin D Rodland
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Pacific Northwest National Laboratory, Richland, Washington, USA
| |
Collapse
|
11
|
Ishikawa M, Sugino S, Masuda Y, Tarumoto Y, Seto Y, Taniyama N, Wagai F, Yamauchi Y, Kojima Y, Kiryu H, Yusa K, Eiraku M, Mochizuki A. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun Biol 2023; 6:1290. [PMID: 38155269 PMCID: PMC10754834 DOI: 10.1038/s42003-023-05594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 11/15/2023] [Indexed: 12/30/2023] Open
Abstract
Single-cell RNA-seq analysis coupled with CRISPR-based perturbation has enabled the inference of gene regulatory networks with causal relationships. However, a snapshot of single-cell CRISPR data may not lead to an accurate inference, since a gene knockout can influence multi-layered downstream over time. Here, we developed RENGE, a computational method that infers gene regulatory networks using a time-series single-cell CRISPR dataset. RENGE models the propagation process of the effects elicited by a gene knockout on its regulatory network. It can distinguish between direct and indirect regulations, which allows for the inference of regulations by genes that are not knocked out. RENGE therefore outperforms current methods in the accuracy of inferring gene regulatory networks. When used on a dataset we derived from human-induced pluripotent stem cells, RENGE yielded a network consistent with multiple databases and literature. Accurate inference of gene regulatory networks by RENGE would enable the identification of key factors for various biological systems.
Collapse
Affiliation(s)
- Masato Ishikawa
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan.
| | - Seiichi Sugino
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yoshie Masuda
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yusuke Tarumoto
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yusuke Seto
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Nobuko Taniyama
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Fumi Wagai
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yuhei Yamauchi
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yasuhiro Kojima
- Laboratory of Computational Life Science, National Cancer Center Research Institute, Tokyo, 104-0045, Japan
| | - Hisanori Kiryu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
| | - Kosuke Yusa
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Mototsugu Eiraku
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8507, Japan
| | - Atsushi Mochizuki
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| |
Collapse
|
12
|
Bernaola N, Michiels M, Larrañaga P, Bielza C. Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks. PLoS Comput Biol 2023; 19:e1011443. [PMID: 38039337 PMCID: PMC10745139 DOI: 10.1371/journal.pcbi.1011443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 12/22/2023] [Accepted: 08/19/2023] [Indexed: 12/03/2023] Open
Abstract
We present the Fast Greedy Equivalence Search (FGES)-Merge, a new method for learning the structure of gene regulatory networks via merging locally learned Bayesian networks, based on the fast greedy equivalent search algorithm. The method is competitive with the state of the art in terms of the Matthews correlation coefficient, which takes into account both precision and recall, while also improving upon it in terms of speed, scaling up to tens of thousands of variables and being able to use empirical knowledge about the topological structure of gene regulatory networks. To showcase the ability of our method to scale to massive networks, we apply it to learning the gene regulatory network for the full human genome using data from samples of different brain structures (from the Allen Human Brain Atlas). Furthermore, this Bayesian network model should predict interactions between genes in a way that is clear to experts, following the current trends in explainable artificial intelligence. To achieve this, we also present a new open-access visualization tool that facilitates the exploration of massive networks and can aid in finding nodes of interest for experimental tests.
Collapse
Affiliation(s)
- Niko Bernaola
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Mario Michiels
- Centro Integral de Neurociencias Abarca Campal, Hospital Universitario HM Puerta del Sur, Madrid, Spain
| | - Pedro Larrañaga
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| | - Concha Bielza
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
13
|
Cingiz MÖ. k- Strong Inference Algorithm: A Hybrid Information Theory Based Gene Network Inference Algorithm. Mol Biotechnol 2023:10.1007/s12033-023-00929-2. [PMID: 37950851 DOI: 10.1007/s12033-023-00929-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/05/2023] [Indexed: 11/13/2023]
Abstract
Gene networks allow researchers to understand the underlying mechanisms between diseases and genes while reducing the need for wet lab experiments. Numerous gene network inference (GNI) algorithms have been presented in the literature to infer accurate gene networks. We proposed a hybrid GNI algorithm, k-Strong Inference Algorithm (ksia), to infer more reliable and robust gene networks from omics datasets. To increase reliability, ksia integrates Pearson correlation coefficient (PCC) and Spearman rank correlation coefficient (SCC) scores to determine mutual information scores between molecules to increase diversity of relation predictions. To infer a more robust gene network, ksia applies three different elimination steps to remove redundant and spurious relations between genes. The performance of ksia was evaluated on microbe microarrays database in the overlap analysis with other GNI algorithms, namely ARACNE, C3NET, CLR, and MRNET. Ksia inferred less number of relations due to its strict elimination steps. However, ksia generally performed better on Escherichia coli (E.coli) and Saccharomyces cerevisiae (yeast) gene expression datasets due to F- measure and precision values. The integration of association estimator scores and three elimination stages slightly increases the performance of ksia based gene networks. Users can access ksia R package and user manual of package via https://github.com/ozgurcingiz/ksia .
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Bursa Technical University, Mimar Sinan Campus, Yildirim, 16310, Bursa, Turkey.
| |
Collapse
|
14
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
15
|
Panda AK, Basu B. Regenerative bioelectronics: A strategic roadmap for precision medicine. Biomaterials 2023; 301:122271. [PMID: 37619262 DOI: 10.1016/j.biomaterials.2023.122271] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/30/2023] [Accepted: 08/06/2023] [Indexed: 08/26/2023]
Abstract
In the past few decades, stem cell-based regenerative engineering has demonstrated its significant potential to repair damaged tissues and to restore their functionalities. Despite such advancement in regenerative engineering, the clinical translation remains a major challenge. In the stance of personalized treatment, the recent progress in bioelectronic medicine likewise evolved as another important research domain of larger significance for human healthcare. Over the last several years, our research group has adopted biomaterials-based regenerative engineering strategies using innovative bioelectronic stimulation protocols based on either electric or magnetic stimuli to direct cellular differentiation on engineered biomaterials with a range of elastic stiffness or functional properties (electroactivity/magnetoactivity). In this article, the role of bioelectronics in stem cell-based regenerative engineering has been critically analyzed to stimulate futuristic research in the treatment of degenerative diseases as well as to address some fundamental questions in stem cell biology. Built on the concepts from two independent biomedical research domains (regenerative engineering and bioelectronic medicine), we propose a converging research theme, 'Regenerative Bioelectronics'. Further, a series of recommendations have been put forward to address the current challenges in bridging the gap in stem cell therapy and bioelectronic medicine. Enacting the strategic blueprint of bioelectronic-based regenerative engineering can potentially deliver the unmet clinical needs for treating incurable degenerative diseases.
Collapse
Affiliation(s)
- Asish Kumar Panda
- Laboratory for Biomaterials, Materials Research Centre, Indian Institute of Science, Bengaluru, 560012, India
| | - Bikramjit Basu
- Laboratory for Biomaterials, Materials Research Centre, Indian Institute of Science, Bengaluru, 560012, India; Centre for Biosystems Science and Engineering, Indian Institute of Science, Bengaluru, 560012, India.
| |
Collapse
|
16
|
Anderson AP, Renn SCP. The Ancestral Modulation Hypothesis: Predicting Mechanistic Control of Sexually Heteromorphic Traits Using Evolutionary History. Am Nat 2023; 202:241-259. [PMID: 37606950 DOI: 10.1086/725438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
AbstractAcross the animal kingdom there are myriad forms within a sex across, and even within, species, rendering concepts of universal sex traits moot. The mechanisms that regulate the development of these trait differences are varied, although in vertebrates, common pathways involve gonadal steroid hormones. Gonadal steroids are often associated with heteromorphic trait development, where the steroid found at higher circulating levels is the one involved in trait development for that sex. Occasionally, there are situations in which a gonadal steroid associated with heteromorphic trait development in one sex is involved in heteromorphic or monomorphic trait development in another sex. We propose a verbal hypothesis, the ancestral modulation hypothesis (AMH), that uses the evolutionary history of the trait-particularly which sex ancestrally possessed higher trait values-to predict the regulatory pathway that governs trait expression. The AMH predicts that the genomic architecture appears first to resolve sexual conflict in an initially monomorphic trait. This architecture takes advantage of existing sex-biased signals, the gonadal steroid pathway, to generate trait heteromorphism. In cases where the other sex experiences evolutionary pressure for the new phenotype, that sex will co-opt the existing architecture by altering its signal to match that of the original high-trait-value sex. We describe the integrated levels needed to produce this pattern and what the expected outcomes will be given the evolutionary history of the trait. We present this framework as a testable hypothesis for the scientific community to investigate and to create further engagement and analysis of both ultimate and proximate approaches to sexual heteromorphism.
Collapse
|
17
|
Mousavi R, Lobo D. Automatic design of gene regulatory mechanisms for spatial pattern formation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550573. [PMID: 37546866 PMCID: PMC10402059 DOI: 10.1101/2023.07.26.550573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Synthetic developmental biology aims to engineer gene regulatory mechanisms (GRMs) for understanding and producing desired multicellular patterns and shapes. However, designing GRMs for spatial patterns is a current challenge due to the nonlinear interactions and feedback loops in genetic circuits. Here we present a methodology to automatically design GRMs that can produce any given spatial pattern. The proposed approach uses two orthogonal morphogen gradients acting as positional information signals in a multicellular tissue area or culture, which constitutes a continuous field of engineered cells implementing the same designed GRM. To efficiently design both the circuit network and the interaction mechanisms-including the number of genes necessary for the formation of the target pattern-we developed an automated algorithm based on high-performance evolutionary computation. The tolerance of the algorithm can be configured to design GRMs that are either simple to produce approximate patterns or complex to produce precise patterns. We demonstrate the approach by automatically designing GRMs that can produce a diverse set of synthetic spatial expression patterns by interpreting just two orthogonal morphogen gradients. The proposed framework offers a versatile approach to systematically design and discover pattern-producing genetic circuits.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Greenebaum Comprehensive Cancer Center and Center for Stem Cell Biology & Regenerative Medicine, University of Maryland, School of Medicine, 22 S. Greene Street, Baltimore, MD 21201, USA
| |
Collapse
|
18
|
Zito F, Cutello V, Pavone M. A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1214. [PMID: 37628244 PMCID: PMC10453511 DOI: 10.3390/e25081214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/20/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023]
Abstract
The ability to simulate gene expression and infer gene regulatory networks has vast potential applications in various fields, including medicine, agriculture, and environmental science. In recent years, machine learning approaches to simulate gene expression and infer gene regulatory networks have gained significant attention as a promising area of research. By simulating gene expression, we can gain insights into the complex mechanisms that control gene expression and how they are affected by various environmental factors. This knowledge can be used to develop new treatments for genetic diseases, improve crop yields, and better understand the evolution of species. In this article, we address this issue by focusing on a novel method capable of simulating the gene expression regulation of a group of genes and their mutual interactions. Our framework enables us to simulate the regulation of gene expression in response to alterations or perturbations that can affect the expression of a gene. We use both artificial and real benchmarks to empirically evaluate the effectiveness of our methodology. Furthermore, we compare our method with existing ones to understand its advantages and disadvantages. We also present future ideas for improvement to enhance the effectiveness of our method. Overall, our approach has the potential to greatly improve the field of gene expression simulation and gene regulatory network inference, possibly leading to significant advancements in genetics.
Collapse
Affiliation(s)
| | | | - Mario Pavone
- Department of Mathematics and Computer Science, University of Catania, 95125 Catania, Italy
| |
Collapse
|
19
|
Li R, Rozum JC, Quail MM, Qasim MN, Sindi SS, Nobile CJ, Albert R, Hernday AD. Inferring gene regulatory networks using transcriptional profiles as dynamical attractors. PLoS Comput Biol 2023; 19:e1010991. [PMID: 37607190 PMCID: PMC10473541 DOI: 10.1371/journal.pcbi.1010991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/01/2023] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to "static" transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
Collapse
Affiliation(s)
- Ruihao Li
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Jordan C. Rozum
- Department of Systems Science and Industrial Engineering, Binghamton University (State University of New York), Binghamton, New York, United States of America
| | - Morgan M. Quail
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Mohammad N. Qasim
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, California, United States of America
| | - Clarissa J. Nobile
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
- Department of Biology, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
| | - Aaron D. Hernday
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| |
Collapse
|
20
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
21
|
Lee S, Jung H, Park J, Ahn J. Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes. Int J Mol Sci 2023; 24:ijms24076445. [PMID: 37047418 PMCID: PMC10095073 DOI: 10.3390/ijms24076445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/17/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Accurate prediction of the prognoses of cancer patients and identification of prognostic biomarkers are both important for the improved treatment of cancer patients, in addition to enhanced anticancer drugs. Many previous bioinformatic studies have been carried out to achieve this goal; however, there remains room for improvement in terms of accuracy. In this study, we demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. To identify patient-specific cancer driver genes, we first generated patient-specific gene networks before using modified PageRank to generate feature vectors that represented the impacts genes had on the patient-specific gene network. Subsequently, the feature vectors of the good and poor prognosis groups were used to train the deep feedforward network. For the 11 cancer types in the TCGA data, the proposed method showed a significantly better prediction performance than the existing state-of-the-art methods for three cancer types (BRCA, CESC and PAAD), better performance for five cancer types (COAD, ESCA, HNSC, KIRC and STAD), and a similar or slightly worse performance for the remaining three cancer types (BLCA, LIHC and LUAD). Furthermore, the case study for the identified breast cancer and cervical squamous cell carcinoma prognostic genes and their subnetworks included several pathways associated with the progression of breast cancer and cervical squamous cell carcinoma. These results suggested that heterogeneous cancer driver information may be associated with cancer prognosis.
Collapse
Affiliation(s)
- Suyeon Lee
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Heewon Jung
- Samsung Electronics Company Ltd., Suwon 16677, Republic of Korea
| | - Jiwoo Park
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
- Correspondence:
| |
Collapse
|
22
|
Kardynska M, Kogut D, Pacholczyk M, Smieja J. Mathematical modeling of regulatory networks of intracellular processes - Aims and selected methods. Comput Struct Biotechnol J 2023; 21:1523-1532. [PMID: 36851915 PMCID: PMC9958294 DOI: 10.1016/j.csbj.2023.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/03/2023] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Regulatory networks structure and signaling pathways dynamics are uncovered in time- and resource consuming experimental work. However, it is increasingly supported by modeling, analytical and computational techniques as well as discrete mathematics and artificial intelligence applied to to extract knowledge from existing databases. This review is focused on mathematical modeling used to analyze dynamics and robustness of these networks. This paper presents a review of selected modeling methods that facilitate advances in molecular biology.
Collapse
Affiliation(s)
- Malgorzata Kardynska
- Dept. of Biosensors and Processing of Biomedical Signals, Silesian University of Technology, Gliwice, Poland
| | - Daria Kogut
- Dept. of Biosensors and Processing of Biomedical Signals, Silesian University of Technology, Gliwice, Poland.,Dept. of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Marcin Pacholczyk
- Dept. of Biosensors and Processing of Biomedical Signals, Silesian University of Technology, Gliwice, Poland.,Dept. of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Jaroslaw Smieja
- Dept. of Biosensors and Processing of Biomedical Signals, Silesian University of Technology, Gliwice, Poland.,Dept. of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
23
|
Nabuco Leva Ferreira de Freitas JA, Bischof O. Dynamic modeling of the cellular senescence gene regulatory network. Heliyon 2023; 9:e14007. [PMID: 36938415 PMCID: PMC10015196 DOI: 10.1016/j.heliyon.2023.e14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 02/13/2023] [Accepted: 02/17/2023] [Indexed: 02/27/2023] Open
Abstract
Cellular senescence is a cell fate that prominently impacts physiological and pathophysiological processes. Diverse cellular stresses induce it, and dramatic gene expression changes accompany it. However, determining the interactions comprising the gene regulatory network (GRN) governing senescence remains challenging. Recent advances in signal processing techniques provide opportunities to reconstruct GRNs. Here, we describe a GRN for senescence integrating time-series transcriptome and transcription factor depletion datasets. Specifically, we infer a set of differential equations using the "Sparse Identification of Nonlinear Dynamics" (SINDy) algorithm, discriminate genes with potential hidden regulators, validate the inferred GRN for time-points not included in the training data, and comprehensively benchmark our approach. Our work is a proof of concept for a data-driven GRN reconstruction method, consolidating an iterative, powerful mathematical platform for senescence modeling that can be used to test hypotheses in silico and has the potential for future discoveries of clinical impact.
Collapse
Affiliation(s)
- José Américo Nabuco Leva Ferreira de Freitas
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Sorbonne Université, UMR 8256, Biological Adaptation and Ageing B2A–IBPS, F-75005, Paris, France
- INSERM U1164, F-75005, Paris, France
| | - Oliver Bischof
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Corresponding author.
| |
Collapse
|
24
|
Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
|
25
|
Cheng X, Amanullah M, Liu W, Liu Y, Pan X, Zhang H, Xu H, Liu P, Lu Y. WMDS.net: a network control framework for identifying key players in transcriptome programs. Bioinformatics 2023; 39:7023921. [PMID: 36727489 PMCID: PMC9925106 DOI: 10.1093/bioinformatics/btad071] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/16/2023] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Mammalian cells can be transcriptionally reprogramed to other cellular phenotypes. Controllability of such complex transitions in transcriptional networks underlying cellular phenotypes is an inherent biological characteristic. This network controllability can be interpreted by operating a few key regulators to guide the transcriptional program from one state to another. Finding the key regulators in the transcriptional program can provide key insights into the network state transition underlying cellular phenotypes. RESULTS To address this challenge, here, we proposed to identify the key regulators in the transcriptional co-expression network as a minimum dominating set (MDS) of driver nodes that can fully control the network state transition. Based on the theory of structural controllability, we developed a weighted MDS network model (WMDS.net) to find the driver nodes of differential gene co-expression networks. The weight of WMDS.net integrates the degree of nodes in the network and the significance of gene co-expression difference between two physiological states into the measurement of node controllability of the transcriptional network. To confirm its validity, we applied WMDS.net to the discovery of cancer driver genes in RNA-seq datasets from The Cancer Genome Atlas. WMDS.net is powerful among various cancer datasets and outperformed the other top-tier tools with a better balance between precision and recall. AVAILABILITY AND IMPLEMENTATION https://github.com/chaofen123/WMDS.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Cheng
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Md Amanullah
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Weigang Liu
- Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Yi Liu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Xiaoqing Pan
- Department of Mathematics, Shanghai Normal University, Xuhui 200234, China
| | - Honghe Zhang
- Department of Pathology, Research Unit of Intelligence Classification of Tumor Pathology and Precision Therapy, Chinese Academy of Medical Sciences, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Pengyuan Liu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| | - Yan Lu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| |
Collapse
|
26
|
Zhang W, Lin Z. iPoLNG-An unsupervised model for the integrative analysis of single-cell multiomics data. Front Genet 2023; 14:998504. [PMID: 36865385 PMCID: PMC9972291 DOI: 10.3389/fgene.2023.998504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 01/24/2023] [Indexed: 02/09/2023] Open
Abstract
Single-cell multiomics technologies, where the transcriptomic and epigenomic profiles are simultaneously measured in the same set of single cells, pose significant challenges for effective integrative analysis. Here, we propose an unsupervised generative model, iPoLNG, for the effective and scalable integration of single-cell multiomics data. iPoLNG reconstructs low-dimensional representations of the cells and features using computationally efficient stochastic variational inference by modelling the discrete counts in single-cell multiomics data with latent factors. The low-dimensional representation of cells enables the identification of distinct cell types, and the feature by factor loading matrices help characterize cell-type specific markers and provide rich biological insights on the functional pathway enrichment analysis. iPoLNG is also able to handle the setting of partial information where certain modality of the cells is missing. Taking advantage of GPU and probabilistic programming, iPoLNG is scalable to large datasets and it takes less than 15 min to implement on datasets with 20,000 cells.
Collapse
Affiliation(s)
- Wenyu Zhang
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | | |
Collapse
|
27
|
Song X. Statistical and Computational Methods for Proteogenomic Data Analysis. Methods Mol Biol 2023; 2629:271-303. [PMID: 36929082 DOI: 10.1007/978-1-0716-2986-4_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Proteins are the functional molecules for almost all cellular and biological processes. They are also the targets of most drugs. Proteins employ complex, multilevel regulations, so their abundance levels do not well correlated with their mRNA expression levels. The structure, activity, and functional roles of proteins are affected by posttranslational modifications (PTM), which are even less correlated with mRNA expression levels than protein abundances. Comprehensive characterization of the proteomics data is critical for understanding the molecular and cellular mechanisms of biological systems and developing news therapeutics. Current large-scale proteomic profiling technologies, such as mass spectrometry, provide relative identification of peptides and proteins, with data vulnerable to outliers, batch effects, and nonrandom missingness. In order to perform high-quality proteomic data analysis, we will first introduce a data preprocessing and quality control pipeline that includes normalization, outlier detection and removal, batch effect identification and handling, and missing data imputation. Then, we will describe several statistical methods that leverage well-processed proteomic data to generate scientific discoveries, especially with an integration with genomics and transcriptomics. These methods cover topics like association analysis, network construction, clustering, and cell-type deconvolution. To demonstrate these methods, we will use the proteogenomic data from the lung squamous cell carcinoma study of the Clinical Proteomic Tumor Analysis Consortium and provide sample codes for data access and analyses.
Collapse
Affiliation(s)
- Xiaoyu Song
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
28
|
Biswas S, Clawson W, Levin M. Learning in Transcriptional Network Models: Computational Discovery of Pathway-Level Memory and Effective Interventions. Int J Mol Sci 2022; 24:ijms24010285. [PMID: 36613729 PMCID: PMC9820177 DOI: 10.3390/ijms24010285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/23/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
Trainability, in any substrate, refers to the ability to change future behavior based on past experiences. An understanding of such capacity within biological cells and tissues would enable a particularly powerful set of methods for prediction and control of their behavior through specific patterns of stimuli. This top-down mode of control (as an alternative to bottom-up modification of hardware) has been extensively exploited by computer science and the behavioral sciences; in biology however, it is usually reserved for organism-level behavior in animals with brains, such as training animals towards a desired response. Exciting work in the field of basal cognition has begun to reveal degrees and forms of unconventional memory in non-neural tissues and even in subcellular biochemical dynamics. Here, we characterize biological gene regulatory circuit models and protein pathways and find them capable of several different kinds of memory. We extend prior results on learning in binary transcriptional networks to continuous models and identify specific interventions (regimes of stimulation, as opposed to network rewiring) that abolish undesirable network behavior such as drug pharmacoresistance and drug sensitization. We also explore the stability of created memories by assessing their long-term behavior and find that most memories do not decay over long time periods. Additionally, we find that the memory properties are quite robust to noise; surprisingly, in many cases noise actually increases memory potential. We examine various network properties associated with these behaviors and find that no one network property is indicative of memory. Random networks do not show similar memory behavior as models of biological processes, indicating that generic network dynamics are not solely responsible for trainability. Rational control of dynamic pathway function using stimuli derived from computational models opens the door to empirical studies of proto-cognitive capacities in unconventional embodiments and suggests numerous possible applications in biomedicine, where behavior shaping of pathway responses stand as a potential alternative to gene therapy.
Collapse
Affiliation(s)
- Surama Biswas
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
- Department of Computer Science & Engineering and Information Technology, Meghnad Saha Institute of Technology, Kolkata 700150, India
| | - Wesley Clawson
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
| | - Michael Levin
- Allen Discovery Center, Tufts University, Medford, MA 02155, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
- Correspondence: ; Tel.: +1-617-627-6161
| |
Collapse
|
29
|
Inference of gene regulatory networks based on the Light Gradient Boosting Machine. Comput Biol Chem 2022; 101:107769. [DOI: 10.1016/j.compbiolchem.2022.107769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/12/2022] [Accepted: 09/06/2022] [Indexed: 11/23/2022]
|
30
|
Xu Y, Chen J, Lyu A, Cheung WK, Zhang L. dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data. Brief Bioinform 2022; 23:6720420. [PMID: 36168811 DOI: 10.1093/bib/bbac424] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/02/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022] Open
Abstract
Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.
Collapse
Affiliation(s)
- Yu Xu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Jiaxing Chen
- Computer Science and Technology, Division of Science and Technology, BNU-HKBU United International College, Jintong Road, 519087, Zhuhai, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
31
|
Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks. Sci Rep 2022; 12:18704. [PMID: 36333425 PMCID: PMC9636198 DOI: 10.1038/s41598-022-21957-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 10/06/2022] [Indexed: 11/06/2022] Open
Abstract
Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)-a subclass of PNMs-the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.
Collapse
|
32
|
Heuts BMH, Arza-Apalategi S, Frölich S, Bergevoet SM, van den Oever SN, van Heeringen SJ, van der Reijden BA, Martens JHA. Identification of transcription factors dictating blood cell development using a bidirectional transcription network-based computational framework. Sci Rep 2022; 12:18656. [PMID: 36333382 PMCID: PMC9636203 DOI: 10.1038/s41598-022-21148-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/23/2022] [Indexed: 11/06/2022] Open
Abstract
Advanced computational methods exploit gene expression and epigenetic datasets to predict gene regulatory networks controlled by transcription factors (TFs). These methods have identified cell fate determining TFs but require large amounts of reference data and experimental expertise. Here, we present an easy to use network-based computational framework that exploits enhancers defined by bidirectional transcription, using as sole input CAGE sequencing data to correctly predict TFs key to various human cell types. Next, we applied this Analysis Algorithm for Networks Specified by Enhancers based on CAGE (ANANSE-CAGE) to predict TFs driving red and white blood cell development, and THP-1 leukemia cell immortalization. Further, we predicted TFs that are differentially important to either cell line- or primary- associated MLL-AF9-driven gene programs, and in primary MLL-AF9 acute leukemia. Our approach identified experimentally validated as well as thus far unexplored TFs in these processes. ANANSE-CAGE will be useful to identify transcription factors that are key to any cell fate change using only CAGE-seq data as input.
Collapse
Affiliation(s)
- B. M. H. Heuts
- grid.5590.90000000122931605Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA Nijmegen, The Netherlands
| | - S. Arza-Apalategi
- grid.10417.330000 0004 0444 9382Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - S. Frölich
- grid.5590.90000000122931605Department of Molecular Developmental Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA Nijmegen, The Netherlands
| | - S. M. Bergevoet
- grid.10417.330000 0004 0444 9382Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - S. N. van den Oever
- grid.5590.90000000122931605Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA Nijmegen, The Netherlands
| | - S. J. van Heeringen
- grid.5590.90000000122931605Department of Molecular Developmental Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA Nijmegen, The Netherlands
| | - B. A. van der Reijden
- grid.10417.330000 0004 0444 9382Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - J. H. A. Martens
- grid.5590.90000000122931605Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
33
|
Majumder S, Thakran Y, Pal V, Singh K. Fuzzy and Rough Set Theory Based Computational Framework for Mining Genetic Interaction Triplets From Gene Expression Profiles for Lung Adenocarcinoma. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3469-3481. [PMID: 34665736 DOI: 10.1109/tcbb.2021.3120844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Genetic interactions are very helpful in understanding different disease and discovering drugs for it. Compared to the gene pairs that represent the genetic interactions between two genes, the gene triplets are more informative and useful. However, existing works on genetic interactions among gene triplets have primarily focused on detecting gene triplets from time series gene expression profiles. Generating the time series gene expression profiles for humans is quite impracticable but the labeled gene expression profiles are available for different diseases in case of humans. In this paper, a computational framework has been proposed to detect gene triplets from labeled gene expression profiles. First, it employs Rough Set Theory for extracting the key genes and then designs a fuzzy inference system for generating possible gene triplets. Further, Root Mean Squared Error measure has been used to prune out the irrelevant gene triplets. In the present work, the proposed computational framework has been applied to labeled lung adenocarcinoma dataset and can be applied to any other labeled gene expression dataset. The extracted gene triplets and their functionalities have been verified with existing biological literature and benchmark databases and the results of verification signify that the proposed framework is promising in terms of finding useful genetic triplets. Further, the proposed framework has been found more efficient as compared to an existing mutual information-based technique in terms of detecting known genetic interactions.
Collapse
|
34
|
Ajmal HB, Madden MG. Dynamic Bayesian Network Learning to Infer Sparse Models From Time Series Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2794-2805. [PMID: 34181549 DOI: 10.1109/tcbb.2021.3092879] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One of the key challenges in systems biology is to derive gene regulatory networks (GRNs) from complex high-dimensional sparse data. Bayesian networks (BNs) and dynamic Bayesian networks (DBNs) have been widely applied to infer GRNs from gene expression data. GRNs are typically sparse but traditional approaches of BN structure learning to elucidate GRNs often produce many spurious (false positive) edges. We present two new BN scoring functions, which are extensions to the Bayesian Information Criterion (BIC) score, with additional penalty terms and use them in conjunction with DBN structure search methods to find a graph structure that maximises the proposed scores. Our BN scoring functions offer better solutions for inferring networks with fewer spurious edges compared to the BIC score. The proposed methods are evaluated extensively on auto regressive and DREAM4 benchmarks. We found that they significantly improve the precision of the learned graphs, relative to the BIC score. The proposed methods are also evaluated on three real time series gene expression datasets. The results demonstrate that our algorithms are able to learn sparse graphs from high-dimensional time series data. The implementation of these algorithms is open source and is available in form of an R package on GitHub at https://github.com/HamdaBinteAjmal/DBN4GRN, along with the documentation and tutorials.
Collapse
|
35
|
Combining kinetic orders for efficient S-System modelling of gene regulatory network. Biosystems 2022; 220:104736. [PMID: 35863700 DOI: 10.1016/j.biosystems.2022.104736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/10/2022] [Accepted: 07/10/2022] [Indexed: 11/21/2022]
Abstract
S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.
Collapse
|
36
|
Gonçalves LO, Pulido AFV, Mathias FAS, Enes AES, Carvalho MGR, de Melo Resende D, Polak ME, Ruiz JC. Expression Profile of Genes Related to the Th17 Pathway in Macrophages Infected by Leishmania major and Leishmania amazonensis: The Use of Gene Regulatory Networks in Modeling This Pathway. Front Cell Infect Microbiol 2022; 12:826523. [PMID: 35774406 PMCID: PMC9239034 DOI: 10.3389/fcimb.2022.826523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/09/2022] [Indexed: 11/13/2022] Open
Abstract
Leishmania amazonensis and Leishmania major are the causative agents of cutaneous and mucocutaneous diseases. The infections‘ outcome depends on host–parasite interactions and Th1/Th2 response, and in cutaneous form, regulation of Th17 cytokines has been reported to maintain inflammation in lesions. Despite that, the Th17 regulatory scenario remains unclear. With the aim to gain a better understanding of the transcription factors (TFs) and genes involved in Th17 induction, in this study, the role of inducing factors of the Th17 pathway in Leishmania–macrophage infection was addressed through computational modeling of gene regulatory networks (GRNs). The Th17 GRN modeling integrated experimentally validated data available in the literature and gene expression data from a time-series RNA-seq experiment (4, 24, 48, and 72 h post-infection). The generated model comprises a total of 10 TFs, 22 coding genes, and 16 cytokines related to the Th17 immune modulation. Addressing the Th17 induction in infected and uninfected macrophages, an increase of 2- to 3-fold in 4–24 h was observed in the former. However, there was a decrease in basal levels at 48–72 h for both groups. In order to evaluate the possible outcomes triggered by GRN component modulation in the Th17 pathway. The generated GRN models promoted an integrative and dynamic view of Leishmania–macrophage interaction over time that extends beyond the analysis of single-gene expression.
Collapse
Affiliation(s)
- Leilane Oliveira Gonçalves
- Programa de Pós-graduação em Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
- Grupo Informática de Biossistemas, Instituto René Rachou, Fiocruz Minas, Belo Horizonte, Brazil
| | - Andrés F. Vallejo Pulido
- Systems Immunology Group, Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | | | - Alexandre Estevão Silvério Enes
- Programa de Pós-graduação em Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
- Grupo Informática de Biossistemas, Instituto René Rachou, Fiocruz Minas, Belo Horizonte, Brazil
| | | | - Daniela de Melo Resende
- Grupo Genômica Funcional de Parasitos, Instituto René Rachou, Fiocruz Minas, Belo Horizonte, Brazil
| | - Marta E. Polak
- Systems Immunology Group, Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
- *Correspondence: Jeronimo C. Ruiz, ; Marta E. Polak,
| | - Jeronimo C. Ruiz
- Grupo Informática de Biossistemas, Instituto René Rachou, Fiocruz Minas, Belo Horizonte, Brazil
- *Correspondence: Jeronimo C. Ruiz, ; Marta E. Polak,
| |
Collapse
|
37
|
Gan Y, Hu X, Zou G, Yan C, Xu G. Inferring Gene Regulatory Networks From Single-Cell Transcriptomic Data Using Bidirectional RNN. Front Oncol 2022; 12:899825. [PMID: 35692809 PMCID: PMC9178250 DOI: 10.3389/fonc.2022.899825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 04/22/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory rules is critical to understanding cellular processes. Existing computational methods usually decompose the inference of gene regulatory networks (GRNs) into multiple subproblems, rather than detecting potential causal relationships simultaneously, which limits the application to data with a small number of genes. Here, we propose BiRGRN, a novel computational algorithm for inferring GRNs from time-series single-cell RNA-seq (scRNA-seq) data. BiRGRN utilizes a bidirectional recurrent neural network to infer GRNs. The recurrent neural network is a complex deep neural network that can capture complex, non-linear, and dynamic relationships among variables. It maps neurons to genes, and maps the connections between neural network layers to the regulatory relationship between genes, providing an intuitive solution to model GRNs with biological closeness and mathematical flexibility. Based on the deep network, we transform the inference of GRNs into a regression problem, using the gene expression data at previous time points to predict the gene expression data at the later time point. Furthermore, we adopt two strategies to improve the accuracy and stability of the algorithm. Specifically, we utilize a bidirectional structure to integrate the forward and reverse inference results and exploit an incomplete set of prior knowledge to filter out some candidate inferences of low confidence. BiRGRN is applied to four simulated datasets and three real scRNA-seq datasets to verify the proposed method. We perform comprehensive comparisons between our proposed method with other state-of-the-art techniques. These experimental results indicate that BiRGRN is capable of inferring GRN simultaneously from time-series scRNA-seq data. Our method BiRGRN is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://gitee.com/DHUDBLab/bi-rgrn.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai, China
- *Correspondence: Guangwei Xu,
| |
Collapse
|
38
|
Hang Y, Burns J, Shealy BT, Pauly R, Ficklin SP, Feltus FA. Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue. BMC Genomics 2022; 23:350. [PMID: 35524179 PMCID: PMC9077899 DOI: 10.1186/s12864-022-08591-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 04/25/2022] [Indexed: 12/24/2022] Open
Abstract
Background Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. Results We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. Conclusions Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species. Supplementary Information The online version contains available at 10.1186/s12864-022-08591-9.
Collapse
Affiliation(s)
- Yuqing Hang
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA
| | - Josh Burns
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Benjamin T Shealy
- Department of Electrical and Computer Engineering, Clemson University, Clemson, 29634, USA
| | - Rini Pauly
- Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Frank A Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA. .,Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA. .,Center for Human Genetics, Clemson University, Clemson, 29634, USA. .,Biosystems Research Complex, 302C, 105 Collings St, Clemson, SC, 29634, USA.
| |
Collapse
|
39
|
GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure. BMC Bioinformatics 2022; 23:156. [PMID: 35501696 PMCID: PMC9063052 DOI: 10.1186/s12859-022-04629-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 03/07/2022] [Indexed: 11/17/2022] Open
Abstract
Background Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04629-7.
Collapse
|
40
|
Manipur I, Manzo M, Granata I, Giordano M, Maddalena L, Guarracino MR. Netpro2vec: A Graph Embedding Framework for Biomedical Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:729-740. [PMID: 33961560 DOI: 10.1109/tcbb.2021.3078089] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The ever-increasing importance of structured data in different applications, especially in the biomedical field, has driven the need for reducing its complexity through projections into a more manageable space. The latest methods for learning features on graphs focus mainly on the neighborhood of nodes and edges. Methods capable of providing a representation that looks beyond the single node neighborhood are kernel graphs. However, they produce handcrafted features unaccustomed with a generalized model. To reduce this gap, in this work we propose a neural embedding framework, based on probability distribution representations of graphs, named Netpro2vec. The goal is to look at basic node descriptions other than the degree, such as those induced by the Transition Matrix and Node Distance Distribution. Netpro2vec provides embeddings completely independent from the task and nature of the data. The framework is evaluated on synthetic and various real biomedical network datasets through a comprehensive experimental classification phase and is compared to well-known competitors.
Collapse
|
41
|
Jansen C, Paraiso KD, Zhou JJ, Blitz IL, Fish MB, Charney RM, Cho JS, Yasuoka Y, Sudou N, Bright AR, Wlizla M, Veenstra GJC, Taira M, Zorn AM, Mortazavi A, Cho KWY. Uncovering the mesendoderm gene regulatory network through multi-omic data integration. Cell Rep 2022; 38:110364. [PMID: 35172134 PMCID: PMC8917868 DOI: 10.1016/j.celrep.2022.110364] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 10/30/2021] [Accepted: 01/19/2022] [Indexed: 01/01/2023] Open
Abstract
Mesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data composed of more than two data types is challenging. Here, we use linked self-organizing maps to combine chromatin immunoprecipitation sequencing (ChIP-seq)/ATAC-seq with temporal, spatial, and perturbation RNA sequencing (RNA-seq) data from Xenopus tropicalis mesendoderm development to build a high-resolution genome scale mechanistic GRN. We recover both known and previously unsuspected TF-DNA/TF-TF interactions validated through reporter assays. Our analysis provides insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly dimensional multi-omic datasets.
Collapse
Affiliation(s)
- Camden Jansen
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Kitt D Paraiso
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Jeff J Zhou
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Ira L Blitz
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Margaret B Fish
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Rebekah M Charney
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Jin Sun Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Yuuri Yasuoka
- Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Norihiro Sudou
- Department of Anatomy, School of Medicine, Toho University, Tokyo, Japan
| | - Ann Rose Bright
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Marcin Wlizla
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Gert Jan C Veenstra
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Masanori Taira
- Department of Biological Sciences, Chuo University, Tokyo, Japan
| | - Aaron M Zorn
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| | - Ken W Y Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| |
Collapse
|
42
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
43
|
Prediction of Time Series Gene Expression and Structural Analysis of Gene Regulatory Networks Using Recurrent Neural Networks. ENTROPY 2022; 24:e24020141. [PMID: 35205437 PMCID: PMC8871363 DOI: 10.3390/e24020141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/14/2022] [Accepted: 01/15/2022] [Indexed: 11/17/2022]
Abstract
Methods for time series prediction and classification of gene regulatory networks (GRNs) from gene expression data have been treated separately so far. The recent emergence of attention-based recurrent neural network (RNN) models boosted the interpretability of RNN parameters, making them appealing for the understanding of gene interactions. In this work, we generated synthetic time series gene expression data from a range of archetypal GRNs and we relied on a dual attention RNN to predict the gene temporal dynamics. We show that the prediction is extremely accurate for GRNs with different architectures. Next, we focused on the attention mechanism of the RNN and, using tools from graph theory, we found that its graph properties allow one to hierarchically distinguish different architectures of the GRN. We show that the GRN responded differently to the addition of noise in the prediction by the RNN and we related the noise response to the analysis of the attention mechanism. In conclusion, this work provides a way to understand and exploit the attention mechanism of RNNs and it paves the way to RNN-based methods for time series prediction and inference of GRNs from gene expression data.
Collapse
|
44
|
Karanam A, Rappel WJ. Boolean modelling in plant biology. QUANTITATIVE PLANT BIOLOGY 2022; 3:e29. [PMID: 37077966 PMCID: PMC10095905 DOI: 10.1017/qpb.2022.26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 10/24/2022] [Accepted: 11/16/2022] [Indexed: 05/03/2023]
Abstract
Signalling and genetic networks underlie most biological processes and are often complex, containing many highly connected components. Modelling these networks can provide insight into mechanisms but is challenging given that rate parameters are often not well defined. Boolean modelling, in which components can only take on a binary value with connections encoded by logic equations, is able to circumvent some of these challenges, and has emerged as a viable tool to probe these complex networks. In this review, we will give an overview of Boolean modelling, with a specific emphasis on its use in plant biology. We review how Boolean modelling can be used to describe biological networks and then discuss examples of its applications in plant genetics and plant signalling.
Collapse
Affiliation(s)
- Aravind Karanam
- Department of Physics, University of California, San Diego, La Jolla, California92093, USA
| | - Wouter-Jan Rappel
- Department of Physics, University of California, San Diego, La Jolla, California92093, USA
- Author for correspondence: W.-J. Rappel, E-mail:
| |
Collapse
|
45
|
Zhang J, Ibrahim F, Najmulski E, Katholos G, Altarawy D, Heath LS, Tulin SL. Developmental gene regulatory network connections predicted by machine learning from gene expression data alone. PLoS One 2021; 16:e0261926. [PMID: 34962963 PMCID: PMC8714117 DOI: 10.1371/journal.pone.0261926] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 12/14/2021] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development-representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes-is one of the most challenging arenas for GRN prediction. In this work, we show that successful GRN predictions for a developmental network from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic Net. We test our GRN prediction methodology using two gene expression datasets for the purple sea urchin, Stronglyocentrotus purpuratus, and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results find a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 81.58%). We also generate novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. Published ChIPseq data and spatial co-expression analysis further support a subset of the top novel predictions. We conclude that GRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.
Collapse
Affiliation(s)
- Jingyi Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Farhan Ibrahim
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Emily Najmulski
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| | - George Katholos
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| | - Doaa Altarawy
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Sarah L. Tulin
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| |
Collapse
|
46
|
Constructing gene regulatory networks using epigenetic data. NPJ Syst Biol Appl 2021; 7:45. [PMID: 34887443 PMCID: PMC8660777 DOI: 10.1038/s41540-021-00208-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/01/2021] [Indexed: 12/24/2022] Open
Abstract
The biological processes that drive cellular function can be represented by a complex network of interactions between regulators (transcription factors) and their targets (genes). A cell's epigenetic state plays an important role in mediating these interactions, primarily by influencing chromatin accessibility. However, how to effectively use epigenetic data when constructing a gene regulatory network remains an open question. Almost all existing network reconstruction approaches focus on estimating transcription factor to gene connections using transcriptomic data. In contrast, computational approaches for analyzing epigenetic data generally focus on improving transcription factor binding site predictions rather than deducing regulatory network relationships. We bridged this gap by developing SPIDER, a network reconstruction approach that incorporates epigenetic data into a message-passing framework to estimate gene regulatory networks. We validated SPIDER's predictions using ChIP-seq data from ENCODE and found that SPIDER networks are both highly accurate and include cell-line-specific regulatory interactions. Notably, SPIDER can recover ChIP-seq verified transcription factor binding events in the regulatory regions of genes that do not have a corresponding sequence motif. The networks estimated by SPIDER have the potential to identify novel hypotheses that will allow us to better characterize cell-type and phenotype specific regulatory mechanisms.
Collapse
|
47
|
Regondi C, Fratelli M, Damia G, Guffanti F, Ganzinelli M, Matteucci M, Masseroli M. Predictive modeling of gene expression regulation. BMC Bioinformatics 2021; 22:571. [PMID: 34837938 PMCID: PMC8626902 DOI: 10.1186/s12859-021-04481-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 11/15/2021] [Indexed: 11/24/2022] Open
Abstract
Background In-depth analysis of regulation networks of genes aberrantly expressed in cancer is essential for better understanding tumors and identifying key genes that could be therapeutically targeted. Results We developed a quantitative analysis approach to investigate the main biological relationships among different regulatory elements and target genes; we applied it to Ovarian Serous Cystadenocarcinoma and 177 target genes belonging to three main pathways (DNA REPAIR, STEM CELLS and GLUCOSE METABOLISM) relevant for this tumor. Combining data from ENCODE and TCGA datasets, we built a predictive linear model for the regulation of each target gene, assessing the relationships between its expression, promoter methylation, expression of genes in the same or in the other pathways and of putative transcription factors. We proved the reliability and significance of our approach in a similar tumor type (basal-like Breast cancer) and using a different existing algorithm (ARACNe), and we obtained experimental confirmations on potentially interesting results. Conclusions The analysis of the proposed models allowed disclosing the relations between a gene and its related biological processes, the interconnections between the different gene sets, and the evaluation of the relevant regulatory elements at single gene level. This led to the identification of already known regulators and/or gene correlations and to unveil a set of still unknown and potentially interesting biological relationships for their pharmacological and clinical use. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04481-1.
Collapse
Affiliation(s)
- Chiara Regondi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy.
| | - Maddalena Fratelli
- Pharmacogenomics Unit, Istituto di Ricerche Farmacologiche Mario Negri, IRCCS, 20156, Milan, Italy
| | - Giovanna Damia
- Laboratory of Molecular Pharmacology, Istituto di Ricerche Farmacologiche Mario Negri, IRCCS, 20156, Milan, Italy
| | - Federica Guffanti
- Laboratory of Molecular Pharmacology, Istituto di Ricerche Farmacologiche Mario Negri, IRCCS, 20156, Milan, Italy
| | - Monica Ganzinelli
- Laboratory of Molecular Pharmacology, Istituto di Ricerche Farmacologiche Mario Negri, IRCCS, 20156, Milan, Italy.,Department of Medical Oncology, Fondazione IRCCS Istituto Nazionale dei Tumori, 20133, Milan, Italy
| | - Matteo Matteucci
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy
| |
Collapse
|
48
|
Chen J, Cheong C, Lan L, Zhou X, Liu J, Lyu A, Cheung WK, Zhang L. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data. Brief Bioinform 2021; 22:bbab325. [PMID: 34424948 PMCID: PMC8499812 DOI: 10.1093/bib/bbab325] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/12/2021] [Accepted: 07/26/2021] [Indexed: 01/11/2023] Open
Abstract
Single-cell RNA sequencing has enabled to capture the gene activities at single-cell resolution, thus allowing reconstruction of cell-type-specific gene regulatory networks (GRNs). The available algorithms for reconstructing GRNs are commonly designed for bulk RNA-seq data, and few of them are applicable to analyze scRNA-seq data by dealing with the dropout events and cellular heterogeneity. In this paper, we represent the joint gene expression distribution of a gene pair as an image and propose a novel supervised deep neural network called DeepDRIM which utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRN from scRNA-seq data. Due to the consideration of TF-gene pair's neighborhood context, DeepDRIM can effectively eliminate the false positives caused by transitive gene-gene interactions. We compared DeepDRIM with nine GRN reconstruction algorithms designed for either bulk or single-cell RNA-seq data. It achieves evidently better performance for the scRNA-seq data collected from eight cell lines. The simulated data show that DeepDRIM is robust to the dropout rate, the cell number and the size of the training data. We further applied DeepDRIM to the scRNA-seq gene expression of B cells from the bronchoalveolar lavage fluid of the patients with mild and severe coronavirus disease 2019. We focused on the cell-type-specific GRN alteration and observed targets of TFs that were differentially expressed between the two statuses to be enriched in lysosome, apoptosis, response to decreased oxygen level and microtubule, which had been proved to be associated with coronavirus infection.
Collapse
Affiliation(s)
- Jiaxing Chen
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - ChinWang Cheong
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Liang Lan
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Jiming Liu
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| |
Collapse
|
49
|
Raharinirina NA, Peppert F, von Kleist M, Schütte C, Sunkara V. Inferring gene regulatory networks from single-cell RNA-seq temporal snapshot data requires higher-order moments. PATTERNS 2021; 2:100332. [PMID: 34553172 PMCID: PMC8441581 DOI: 10.1016/j.patter.2021.100332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 02/23/2021] [Accepted: 07/22/2021] [Indexed: 11/30/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become ubiquitous in biology. Recently, there has been a push for using scRNA-seq snapshot data to infer the underlying gene regulatory networks (GRNs) steering cellular function. To date, this aspiration remains unrealized due to technical and computational challenges. In this work we focus on the latter, which is under-represented in the literature. We took a systemic approach by subdividing the GRN inference into three fundamental components: data pre-processing, feature extraction, and inference. We observed that the regulatory signature is captured in the statistical moments of scRNA-seq data and requires computationally intensive minimization solvers to extract it. Furthermore, current data pre-processing might not conserve these statistical moments. Although our moment-based approach is a didactic tool for understanding the different compartments of GRN inference, this line of thinking—finding computationally feasible multi-dimensional statistics of data—is imperative for designing GRN inference methods. Single-cell RNA-seq temporal snapshot data for detecting regulation Challenges in data pre-processing, feature extraction, and network inference for GRNs Encoding of regulatory information in higher-order raw moments Non-linear least-squares inference for temporal scRNA-seq snapshot data
Single-cell RNA sequencing (scRNA-seq) has become ubiquitous in biology. Recently, there has been a push for using scRNA-seq snapshot data to infer the underlying gene regulatory networks (GRNs) steering cellular function. A recent benchmark of 12 GRN methods demonstrated that the algorithms struggled to predict the ground-truth GRNs and speculated that the low performance was due to the insufficient resolution in the scRNA-seq data. Rather than proposing another method, this paper focuses on how to decompose a GRN problem into three subproblems (pre-processing, feature extraction, and inference), so that the gene regulatory information is preserved in each step. Subsequently, we discuss how to best approach each of the three subproblems.
Collapse
Affiliation(s)
| | - Felix Peppert
- Explainable A.I. for Biology, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Max von Kleist
- MF1 Bioinformatics, Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Christof Schütte
- Mathematics of Complex Systems, Zuse Institute Berlin, 14195 Berlin, Germany.,Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Vikram Sunkara
- Mathematics of Complex Systems, Zuse Institute Berlin, 14195 Berlin, Germany.,Explainable A.I. for Biology, Zuse Institute Berlin, 14195 Berlin, Germany
| |
Collapse
|
50
|
Srinivasan A, Bain M, Baskar A. Learning explanations for biological feedback with delays using an event calculus. Mach Learn 2021. [DOI: 10.1007/s10994-021-06038-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|