1
|
Jia C, Grima R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics. Nat Commun 2024; 15:6557. [PMID: 39095346 PMCID: PMC11297302 DOI: 10.1038/s41467-024-50716-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 07/13/2024] [Indexed: 08/04/2024] Open
Abstract
Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of simulation methods that can accurately and efficiently predict how the distributions of gene product numbers vary across parameter space. To overcome these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap's computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene-gene interactions results in precise coordination and control of gene expression.
Collapse
Affiliation(s)
- Chen Jia
- Applied and Computational Mathematics Division, Beijing Computational Science Research Center, Beijing, China
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
2
|
Chang Z, Xu Y, Dong X, Gao Y, Wang C. Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro. Bioinformatics 2024; 40:btae466. [PMID: 39024032 PMCID: PMC11288411 DOI: 10.1093/bioinformatics/btae466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/05/2024] [Accepted: 07/17/2024] [Indexed: 07/20/2024] Open
Abstract
MOTIVATION The burgeoning generation of single-cell or spatial multiomic data allows for the characterization of gene regulation networks (GRNs) at an unprecedented resolution. However, the accurate reconstruction of GRNs from sparse and noisy single-cell or spatial multiomic data remains challenging. RESULTS Here, we present SCRIPro, a comprehensive computational framework that robustly infers GRNs for both single-cell and spatial multi-omics data. SCRIPro first improves sample coverage through a density clustering approach based on multiomic and spatial similarities. Additionally, SCRIPro scans transcriptional regulator (TR) importance by performing chromatin reconstruction and in silico deletion analyses using a comprehensive reference covering 1,292 human and 994 mouse TRs. Finally, SCRIPro combines TR-target importance scores derived from multiomic data with TR-target expression levels to ensure precise GRN reconstruction. We benchmarked SCRIPro on various datasets, including single-cell multiomic data from human B-cell lymphoma, mouse hair follicle development, Stereo-seq of mouse embryos, and Spatial-ATAC-RNA from mouse brain. SCRIPro outperforms existing motif-based methods and accurately reconstructs cell type-specific, stage-specific, and region-specific GRNs. Overall, SCRIPro emerges as a streamlined and fast method capable of reconstructing TR activities and GRNs for both single-cell and spatial multi-omic data. AVAILABILITY SCRIPro is available at https://github.com/wanglabtongji/SCRIPro. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhanhe Chang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yunfan Xu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
| | - Yawei Gao
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Frontier Science Center for Stem Cell Research, Tongji University, Shanghai, China
- National Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai 200120, China
- Frontier Science Center for Intelligent Autonomous Systems, Tongji University, Shanghai 200120, China
| |
Collapse
|
3
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
4
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
5
|
Graham JP, Zhang Y, He L, Gonzalez-Fernandez T. CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601587. [PMID: 39005295 PMCID: PMC11244939 DOI: 10.1101/2024.07.01.601587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
CRISPR gene editing strategies are shaping cell therapies through precise and tunable control over gene expression. However, achieving reliable therapeutic effects with improved safety and efficacy requires informed target gene selection. This depends on a thorough understanding of the involvement of target genes in gene regulatory networks (GRNs) that regulate cell phenotype and function. Machine learning models have been previously used for GRN reconstruction using RNA-seq data, but current techniques are limited to single cell types and focus mainly on transcription factors. This restriction overlooks many potential CRISPR target genes, such as those encoding extracellular matrix components, growth factors, and signaling molecules, thus limiting the applicability of these models for CRISPR strategies. To address these limitations, we have developed CRISPR-GEM, a multi-layer perceptron (MLP)-based synthetic GRN constructed to accurately predict the downstream effects of CRISPR gene editing. First, input and output nodes are identified as differentially expressed genes between defined experimental and target cell/tissue types respectively. Then, MLP training learns regulatory relationships in a black-box approach allowing accurate prediction of output gene expression using only input gene expression. Finally, CRISPR-mimetic perturbations are made to each input gene individually and the resulting model predictions are compared to those for the target group to score and assess each input gene as a CRISPR candidate. The top scoring genes provided by CRISPR-GEM therefore best modulate experimental group GRNs to motivate transcriptomic shifts towards a target group phenotype. This machine learning model is the first of its kind for predicting optimal CRISPR target genes and serves as a powerful tool for enhanced CRISPR strategies across a range of cell therapies.
Collapse
Affiliation(s)
- Josh P Graham
- Department of Bioengineering, Lehigh University, Bethlehem, PA, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA, USA
- Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA, USA
| | - Lifang He
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA
| | | |
Collapse
|
6
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
7
|
Liu Y, Zhang SY, Kleijn IT, Stumpf MPH. Approximate Bayesian computation for inferring Waddington landscapes from single-cell data. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231697. [PMID: 39076359 PMCID: PMC11285904 DOI: 10.1098/rsos.231697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 05/01/2024] [Indexed: 07/31/2024]
Abstract
Single-cell technologies allow us to gain insights into cellular processes at unprecedented resolution. In stem cell and developmental biology snapshot data allow us to characterize how the transcriptional states of cells change between successive cell types. Here, we show how approximate Bayesian computation (ABC) can be employed to calibrate mathematical models against single-cell data. In our simulation study, we demonstrate the pivotal role of the adequate choice of distance measures appropriate for single-cell data. We show that for good distance measures, notably optimal transport with the Sinkhorn divergence, we can infer parameters for mathematical models from simulated single-cell data. We show that the ABC posteriors can be used (i) to characterize parameter sensitivity and identify dependencies between different parameters and (ii) to construct representations of the Waddington or epigenetic landscape, which forms a popular and interpretable representation of the developmental dynamics. In summary, these results pave the way for fitting mechanistic models of stem cell differentiation to single-cell data.
Collapse
Affiliation(s)
- Yujing Liu
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Stephen Y. Zhang
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Michael P. H. Stumpf
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
- School of BioScience, University of Melbourne, Melbourne, Australia
| |
Collapse
|
8
|
Luppi AI, Rosas FE, Mediano PAM, Demertzi A, Menon DK, Stamatakis EA. Unravelling consciousness and brain function through the lens of time, space, and information. Trends Neurosci 2024; 47:551-568. [PMID: 38824075 DOI: 10.1016/j.tins.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/29/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Disentangling how cognitive functions emerge from the interplay of brain dynamics and network architecture is among the major challenges that neuroscientists face. Pharmacological and pathological perturbations of consciousness provide a lens to investigate these complex challenges. Here, we review how recent advances about consciousness and the brain's functional organisation have been driven by a common denominator: decomposing brain function into fundamental constituents of time, space, and information. Whereas unconsciousness increases structure-function coupling across scales, psychedelics may decouple brain function from structure. Convergent effects also emerge: anaesthetics, psychedelics, and disorders of consciousness can exhibit similar reconfigurations of the brain's unimodal-transmodal functional axis. Decomposition approaches reveal the potential to translate discoveries across species, with computational modelling providing a path towards mechanistic integration.
Collapse
Affiliation(s)
- Andrea I Luppi
- Division of Anaesthesia, University of Cambridge, Cambridge, UK; Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK; Montreal Neurological Institute, McGill University, Montreal, QC, Canada; St John's College, University of Cambridge, Cambridge, UK; Center for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, UK.
| | - Fernando E Rosas
- Center for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, UK; Department of Informatics, University of Sussex, Brighton, UK; Center for Psychedelic Research, Imperial College London, London, UK
| | | | - Athena Demertzi
- Physiology of Cognition Lab, GIGA-Cyclotron Research Center In Vivo Imaging, University of Liège, Liège 4000, Belgium; Psychology and Neuroscience of Cognition Research Unit, University of Liège, Liège 4000, Belgium; National Fund for Scientific Research (FNRS), Brussels 1000, Belgium
| | - David K Menon
- Division of Anaesthesia, University of Cambridge, Cambridge, UK
| | - Emmanuel A Stamatakis
- Division of Anaesthesia, University of Cambridge, Cambridge, UK; Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| |
Collapse
|
9
|
Wang Y, Zhou F, Guan J. SFINN: inferring gene regulatory network from single-cell and spatial transcriptomic data with shared factor neighborhood and integrated neural network. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae433. [PMID: 38950180 PMCID: PMC11236097 DOI: 10.1093/bioinformatics/btae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/18/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024]
Abstract
MOTIVATION The rise of single-cell RNA sequencing (scRNA-seq) technology presents new opportunities for constructing detailed cell type-specific gene regulatory networks (GRNs) to study cell heterogeneity. However, challenges caused by noises, technical errors, and dropout phenomena in scRNA-seq data pose significant obstacles to GRN inference, making the design of accurate GRN inference algorithms still essential. The recent growth of both single-cell and spatial transcriptomic sequencing data enables the development of supervised deep learning methods to infer GRNs on these diverse single-cell datasets. RESULTS In this study, we introduce a novel deep learning framework based on shared factor neighborhood and integrated neural network (SFINN) for inferring potential interactions and causalities between transcription factors and target genes from single-cell and spatial transcriptomic data. SFINN utilizes shared factor neighborhood to construct cellular neighborhood network based on gene expression data and additionally integrates cellular network generated from spatial location information. Subsequently, the cell adjacency matrix and gene pair expression are fed into an integrated neural network framework consisting of a graph convolutional neural network and a fully-connected neural network to determine whether the genes interact. Performance evaluation in the tasks of gene interaction and causality prediction against the existing GRN reconstruction algorithms demonstrates the usability and competitiveness of SFINN across different kinds of data. SFINN can be applied to infer GRNs from conventional single-cell sequencing data and spatial transcriptomic data. AVAILABILITY AND IMPLEMENTATION SFINN can be accessed at GitHub: https://github.com/JGuan-lab/SFINN.
Collapse
Affiliation(s)
- Yongjie Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Fengfan Zhou
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
10
|
Zhang L, Sagan A, Qin B, Hu B, Osmanbeyoglu HU. STAN, a computational framework for inferring spatially informed transcription factor activity across cellular contexts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.26.600782. [PMID: 38979296 PMCID: PMC11230390 DOI: 10.1101/2024.06.26.600782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Transcription factors (TFs) drive significant cellular changes in response to environmental cues and intercellular signaling. Neighboring cells influence TF activity and, consequently, cellular fate and function. Spatial transcriptomics (ST) captures mRNA expression patterns across tissue samples, enabling characterization of the local microenvironment. However, these datasets have not been fully leveraged to systematically estimate TF activity governing cell identity. Here, we present STAN ( S patially informed T ranscription factor A ctivity N etwork), a linear mixed-effects computational method that predicts spot-specific, spatially informed TF activities by integrating curated TF-target gene priors, mRNA expression, spatial coordinates, and morphological features from corresponding imaging data. We tested STAN using lymph node, breast cancer, and glioblastoma ST datasets to demonstrate its applicability by identifying TFs associated with specific cell types, spatial domains, pathological regions, and ligand-receptor pairs. STAN augments the utility of ST to reveal the intricate interplay between TFs and spatial organization across a spectrum of cellular contexts.
Collapse
|
11
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
12
|
Koçillari L, Lorenz GM, Engel NM, Celotto M, Curreli S, Malerba SB, Engel AK, Fellin T, Panzeri S. Sampling bias corrections for accurate neural measures of redundant, unique, and synergistic information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597303. [PMID: 38895197 PMCID: PMC11185652 DOI: 10.1101/2024.06.04.597303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Shannon Information theory has long been a tool of choice to measure empirically how populations of neurons in the brain encode information about cognitive variables. Recently, Partial Information Decomposition (PID) has emerged as principled way to break down this information into components identifying not only the unique information carried by each neuron, but also whether relationships between neurons generate synergistic or redundant information. While it has been long recognized that Shannon information measures on neural activity suffer from a (mostly upward) limited sampling estimation bias, this issue has largely been ignored in the burgeoning field of PID analysis of neural activity. We used simulations to investigate the limited sampling bias of PID computed from discrete probabilities (suited to describe neural spiking activity). We found that PID suffers from a large bias that is uneven across components, with synergy by far the most biased. Using approximate analytical expansions, we found that the bias of synergy increases quadratically with the number of discrete responses of each neuron, whereas the bias of unique and redundant information increase only linearly or sub-linearly. Based on the understanding of the PID bias properties, we developed simple yet effective procedures that correct for the bias effectively, and that improve greatly the PID estimation with respect to current state-of-the-art procedures. We apply these PID bias correction procedures to datasets of 53117 pairs neurons in auditory cortex, posterior parietal cortex and hippocampus of mice performing cognitive tasks, deriving precise estimates and bounds of how synergy and redundancy vary across these brain regions.
Collapse
Affiliation(s)
- Loren Koçillari
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Gabriel Matías Lorenz
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Istituto Italiano di Tecnologia, Genova, Italy
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Nicola Marie Engel
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Marco Celotto
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Istituto Italiano di Tecnologia, Genova, Italy
| | | | - Simone Blanco Malerba
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Andreas K. Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | | | - Stefano Panzeri
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Istituto Italiano di Tecnologia, Genova, Italy
| |
Collapse
|
13
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
14
|
Peng D, Cahan P. OneSC: A computational platform for recapitulating cell state transitions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596831. [PMID: 38895453 PMCID: PMC11185539 DOI: 10.1101/2024.05.31.596831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Computational modelling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a wet lab. Recent advancements in single-cell RNA sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico 'synthetic' cells that faithfully mimic the temporal trajectories. Here we present OneSC, a platform that can simulate synthetic cells across developmental trajectories using systems of stochastic differential equations govern by a core transcription factors (TFs) regulatory network. Different from the current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and steady cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes and monocytes). Finally, through the in-silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations.
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland, 21205, USA
| |
Collapse
|
15
|
Wan R, Zhang Y, Peng Y, Tian F, Gao G, Tang F, Jia J, Ge H. Unveiling gene regulatory networks during cellular state transitions without linkage across time points. Sci Rep 2024; 14:12355. [PMID: 38811747 PMCID: PMC11137113 DOI: 10.1038/s41598-024-62850-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 05/22/2024] [Indexed: 05/31/2024] Open
Abstract
Time-stamped cross-sectional data, which lack linkage across time points, are commonly generated in single-cell transcriptional profiling. Many previous methods for inferring gene regulatory networks (GRNs) driving cell-state transitions relied on constructing single-cell temporal ordering. Introducing COSLIR (COvariance restricted Sparse LInear Regression), we presented a direct approach to reconstructing GRNs that govern cell-state transitions, utilizing only the first and second moments of samples between two consecutive time points. Simulations validated COSLIR's perfect accuracy in the oracle case and demonstrated its robust performance in real-world scenarios. When applied to single-cell RT-PCR and RNAseq datasets in developmental biology, COSLIR competed favorably with existing methods. Notably, its running time remained nearly independent of the number of cells. Therefore, COSLIR emerges as a promising addition to GRN reconstruction methods under cell-state transitions, bypassing the single-cell temporal ordering to enhance accuracy and efficiency in single-cell transcriptional profiling.
Collapse
Affiliation(s)
- Ruosi Wan
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Yuhao Zhang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Yongli Peng
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Feng Tian
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Jinzhu Jia
- School of Public Health and Center for Statistical Science, Peking University, Beijing, China.
| | - Hao Ge
- Beijing International Center for Mathematical Research, Peking University, Beijing, China.
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China.
| |
Collapse
|
16
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
17
|
Lei Y, Huang XT, Guo X, Hang Katie Chan K, Gao L. DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations. Brief Bioinform 2024; 25:bbae334. [PMID: 38980373 PMCID: PMC11232306 DOI: 10.1093/bib/bbae334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.
Collapse
Affiliation(s)
- Yahui Lei
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xiao-Tai Huang
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Kei Hang Katie Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
- Department of Epidemiology and Center for Global Cardiometabolic Health, Brown University, Providence, RI, United States
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| |
Collapse
|
18
|
Singh R, Wu AP, Mudide A, Berger B. Causal gene regulatory analysis with RNA velocity reveals an interplay between slow and fast transcription factors. Cell Syst 2024; 15:462-474.e5. [PMID: 38754366 DOI: 10.1016/j.cels.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/25/2023] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Single-cell expression dynamics, from differentiation trajectories or RNA velocity, have the potential to reveal causal links between transcription factors (TFs) and their target genes in gene regulatory networks (GRNs). However, existing methods either overlook these expression dynamics or necessitate that cells be ordered along a linear pseudotemporal axis, which is incompatible with branching trajectories. We introduce Velorama, an approach to causal GRN inference that represents single-cell differentiation dynamics as a directed acyclic graph of cells, constructed from pseudotime or RNA velocity measurements. Additionally, Velorama enables the estimation of the speed at which TFs influence target genes. Applying Velorama, we uncover evidence that the speed of a TF's interactions is tied to its regulatory function. For human corticogenesis, we find that slow TFs are linked to gliomas, while fast TFs are associated with neuropsychiatric diseases. We expect Velorama to become a critical part of the RNA velocity toolkit for investigating the causal drivers of differentiation and disease.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
| | - Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Anish Mudide
- Phillips Exeter Academy, Exeter, NH 03883, USA; Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
19
|
Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024; 15:4055. [PMID: 38744843 DOI: 10.1038/s41467-024-48516-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.
Collapse
Affiliation(s)
- Yazdan Zinati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Abdulrahman Takiddeen
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
- Mila, Quebec AI Institute, Montreal, QC, Canada.
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC, Canada.
| |
Collapse
|
20
|
Lee J, Kim N, Cho KH. Decoding the principle of cell-fate determination for its reverse control. NPJ Syst Biol Appl 2024; 10:47. [PMID: 38710700 PMCID: PMC11074314 DOI: 10.1038/s41540-024-00372-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Understanding and manipulating cell fate determination is pivotal in biology. Cell fate is determined by intricate and nonlinear interactions among molecules, making mathematical model-based quantitative analysis indispensable for its elucidation. Nevertheless, obtaining the essential dynamic experimental data for model development has been a significant obstacle. However, recent advancements in large-scale omics data technology are providing the necessary foundation for developing such models. Based on accumulated experimental evidence, we can postulate that cell fate is governed by a limited number of core regulatory circuits. Following this concept, we present a conceptual control framework that leverages single-cell RNA-seq data for dynamic molecular regulatory network modeling, aiming to identify and manipulate core regulatory circuits and their master regulators to drive desired cellular state transitions. We illustrate the proposed framework by applying it to the reversion of lung cancer cell states, although it is more broadly applicable to understanding and controlling a wide range of cell-fate determination processes.
Collapse
Affiliation(s)
- Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Namhee Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- biorevert, Inc., Daejeon, Republic of Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
21
|
Guo C, Huang Z, Chen J, Yu G, Wang Y, Wang X. Identification of Novel Regulators of Leaf Senescence Using a Deep Learning Model. PLANTS (BASEL, SWITZERLAND) 2024; 13:1276. [PMID: 38732491 PMCID: PMC11085074 DOI: 10.3390/plants13091276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Deep learning has emerged as a powerful tool for investigating intricate biological processes in plants by harnessing the potential of large-scale data. Gene regulation is a complex process that transcription factors (TFs), cooperating with their target genes, participate in through various aspects of biological processes. Despite its significance, the study of gene regulation has primarily focused on a limited number of notable instances, leaving numerous aspects and interactions yet to be explored comprehensively. Here, we developed DEGRN (Deep learning on Expression for Gene Regulatory Network), an innovative deep learning model designed to decipher gene interactions by leveraging high-dimensional expression data obtained from bulk RNA-Seq and scRNA-Seq data in the model plant Arabidopsis. DEGRN exhibited a compared level of predictive power when applied to various datasets. Through the utilization of DEGRN, we successfully identified an extensive set of 3,053,363 high-quality interactions, encompassing 1430 TFs and 13,739 non-TF genes. Notably, DEGRN's predictive capabilities allowed us to uncover novel regulators involved in a range of complex biological processes, including development, metabolism, and stress responses. Using leaf senescence as an example, we revealed a complex network underpinning this process composed of diverse TF families, including bHLH, ERF, and MYB. We also identified a novel TF, named MAF5, whose expression showed a strong linear regression relation during the progression of senescence. The mutant maf5 showed early leaf decay compared to the wild type, indicating a potential role in the regulation of leaf senescence. This hypothesis was further supported by the expression patterns observed across four stages of leaf development, as well as transcriptomics analysis. Overall, the comprehensive coverage provided by DEGRN expands our understanding of gene regulatory networks and paves the way for further investigations into their functional implications.
Collapse
Affiliation(s)
| | | | | | | | | | - Xu Wang
- Shanghai Collaborative Innovation Center of Agri-Seeds, Joint Center for Single Cell Biology, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (C.G.); (Z.H.); (J.C.); (G.Y.); (Y.W.)
| |
Collapse
|
22
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
23
|
Stock M, Popp N, Fiorentino J, Scialdone A. Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data. Bioinformatics 2024; 40:btae267. [PMID: 38627250 PMCID: PMC11096270 DOI: 10.1093/bioinformatics/btae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/28/2024] [Accepted: 04/16/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms' ability to capture structural properties of networks, which are fundamental, e.g., for studying the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. RESULTS To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four inference algorithms and provide guidance on which algorithm should be used depending on the global network property of interest. AVAILABILITY AND IMPLEMENTATION STREAMLINE is available at https://github.com/ScialdoneLab/STREAMLINE. The data generated in this study are available at https://doi.org/10.5281/zenodo.10710444.
Collapse
Affiliation(s)
- Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich 85354, Germany
| | - Niclas Popp
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| |
Collapse
|
24
|
Kim H, Chang W, Chae SJ, Park JE, Seo M, Kim JK. scLENS: data-driven signal detection for unbiased scRNA-seq data analysis. Nat Commun 2024; 15:3575. [PMID: 38678050 DOI: 10.1038/s41467-024-47884-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 04/14/2024] [Indexed: 04/29/2024] Open
Abstract
High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While dimensionality reduction tools have been developed to extract biological signals from the data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, can unintentionally distort signals in the data. Here, we develop scLENS, a dimensionality reduction tool that circumvents the long-standing issues of signal distortion and manual input. Specifically, we identify the primary cause of signal distortion during log normalization and effectively address it by uniformizing cell vector lengths with L2 normalization. Furthermore, we utilize random matrix theory-based noise filtering and a signal robustness test to enable data-driven determination of the threshold for signal dimensions. Our method outperforms 11 widely used dimensionality reduction tools and performs particularly well for challenging scRNA-seq datasets with high sparsity and variability. To facilitate the use of scLENS, we provide a user-friendly package that automates accurate signal detection of scRNA-seq data without manual time-consuming tuning.
Collapse
Affiliation(s)
- Hyun Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
| | - Won Chang
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Seok Joo Chae
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Minseok Seo
- Department of Computer and Information Science, Korea University, Sejong, 30019, Republic of Korea
| | - Jae Kyoung Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea.
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
25
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2024:10.1038/s41587-024-02182-7. [PMID: 38609714 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
26
|
Fiorentino J, Armaos A, Colantoni A, Tartaglia G. Prediction of protein-RNA interactions from single-cell transcriptomic data. Nucleic Acids Res 2024; 52:e31. [PMID: 38364867 PMCID: PMC11014251 DOI: 10.1093/nar/gkae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/18/2024] Open
Abstract
Proteins are crucial in regulating every aspect of RNA life, yet understanding their interactions with coding and noncoding RNAs remains limited. Experimental studies are typically restricted to a small number of cell lines and a limited set of RNA-binding proteins (RBPs). Although computational methods based on physico-chemical principles can predict protein-RNA interactions accurately, they often lack the ability to consider cell-type-specific gene expression and the broader context of gene regulatory networks (GRNs). Here, we assess the performance of several GRN inference algorithms in predicting protein-RNA interactions from single-cell transcriptomic data, and propose a pipeline, called scRAPID (single-cell transcriptomic-based RnA Protein Interaction Detection), that integrates these methods with the catRAPID algorithm, which can identify direct physical interactions between RBPs and RNA molecules. Our approach demonstrates that RBP-RNA interactions can be predicted from single-cell transcriptomic data, with performances comparable or superior to those achieved for the well-established task of inferring transcription factor-target interactions. The incorporation of catRAPID significantly enhances the accuracy of identifying interactions, particularly with long noncoding RNAs, and enables the identification of hub RBPs and RNAs. Additionally, we show that interactions between RBPs can be detected based on their inferred RNA targets. The software is freely available at https://github.com/tartaglialabIIT/scRAPID.
Collapse
Affiliation(s)
- Jonathan Fiorentino
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
| | - Alexandros Armaos
- Centre for Human Technologies (CHT), RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Alessio Colantoni
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, 00185 Rome, Italy
| | - Gian Gaetano Tartaglia
- Center for Life Nano- and Neuro-Science, RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Centre for Human Technologies (CHT), RNA Systems Biology Lab, Fondazione Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| |
Collapse
|
27
|
Brückner DB, Broedersz CP. Learning dynamical models of single and collective cell migration: a review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2024; 87:056601. [PMID: 38518358 DOI: 10.1088/1361-6633/ad36d2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/22/2024] [Indexed: 03/24/2024]
Abstract
Single and collective cell migration are fundamental processes critical for physiological phenomena ranging from embryonic development and immune response to wound healing and cancer metastasis. To understand cell migration from a physical perspective, a broad variety of models for the underlying physical mechanisms that govern cell motility have been developed. A key challenge in the development of such models is how to connect them to experimental observations, which often exhibit complex stochastic behaviours. In this review, we discuss recent advances in data-driven theoretical approaches that directly connect with experimental data to infer dynamical models of stochastic cell migration. Leveraging advances in nanofabrication, image analysis, and tracking technology, experimental studies now provide unprecedented large datasets on cellular dynamics. In parallel, theoretical efforts have been directed towards integrating such datasets into physical models from the single cell to the tissue scale with the aim of conceptualising the emergent behaviour of cells. We first review how this inference problem has been addressed in both freely migrating and confined cells. Next, we discuss why these dynamics typically take the form of underdamped stochastic equations of motion, and how such equations can be inferred from data. We then review applications of data-driven inference and machine learning approaches to heterogeneity in cell behaviour, subcellular degrees of freedom, and to the collective dynamics of multicellular systems. Across these applications, we emphasise how data-driven methods can be integrated with physical active matter models of migrating cells, and help reveal how underlying molecular mechanisms control cell behaviour. Together, these data-driven approaches are a promising avenue for building physical models of cell migration directly from experimental data, and for providing conceptual links between different length-scales of description.
Collapse
Affiliation(s)
- David B Brückner
- Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| | - Chase P Broedersz
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
- Arnold Sommerfeld Center for Theoretical Physics and Center for NanoScience, Department of Physics, Ludwig-Maximilian-University Munich, Theresienstr. 37, D-80333 Munich, Germany
| |
Collapse
|
28
|
Zhu L, Wang J. Quantifying Landscape-Flux via Single-Cell Transcriptomics Uncovers the Underlying Mechanism of Cell Cycle. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2308879. [PMID: 38353329 DOI: 10.1002/advs.202308879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 01/23/2024] [Indexed: 04/25/2024]
Abstract
Recent developments in single-cell sequencing technology enable the acquisition of entire transcriptome data. Understanding the underlying mechanism and identifying the driving force of transcriptional regulation governing cell function directly from these data remains challenging. This study reconstructs a continuous vector field of the cell cycle based on discrete single-cell RNA velocity to quantify the single-cell global nonequilibrium dynamic landscape-flux. It reveals that large fluctuations disrupt the global landscape and genetic perturbations alter landscape-flux, thus identifying key genes in maintaining cell cycle dynamics and predicting associated functional effects. Additionally, it quantifies the fundamental energy cost of the cell cycle initiation and unveils that sustaining the cell cycle requires curl flux and dissipation to maintain the oscillatory phase coherence. This study enables the inference of the cell cycle gene regulatory networks directly from the single-cell transcriptomic data, including the feedback mechanisms and interaction intensity. This provides a golden opportunity to experimentally verify the landscape-flux theory and also obtain its associated quantifications. It also offers a unique framework for combining the landscape-flux theory and single-cell high-through sequencing experiments for understanding the underlying mechanisms of the cell cycle and can be extended to other nonequilibrium biological processes, such as differentiation development and disease pathogenesis.
Collapse
Affiliation(s)
- Ligang Zhu
- College of Physics, Jilin University, Changchun, 130021, P. R. China
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, 130022, P. R. China
| | - Jin Wang
- Center for Theoretical Interdisciplinary Sciences, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, P. R. China
- Department of Chemistry, Physics and Astronomy, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
29
|
Malekpour SA, Haghverdi L, Sadeghi M. Single-cell multi-omics analysis identifies context-specific gene regulatory gates and mechanisms. Brief Bioinform 2024; 25:bbae180. [PMID: 38653489 PMCID: PMC11036345 DOI: 10.1093/bib/bbae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/29/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
There is a growing interest in inferring context specific gene regulatory networks from single-cell RNA sequencing (scRNA-seq) data. This involves identifying the regulatory relationships between transcription factors (TFs) and genes in individual cells, and then characterizing these relationships at the level of specific cell types or cell states. In this study, we introduce scGATE (single-cell gene regulatory gate) as a novel computational tool for inferring TF-gene interaction networks and reconstructing Boolean logic gates involving regulatory TFs using scRNA-seq data. In contrast to current Boolean models, scGATE eliminates the need for individual formulations and likelihood calculations for each Boolean rule (e.g. AND, OR, XOR). By employing a Bayesian framework, scGATE infers the Boolean rule after fitting the model to the data, resulting in significant reductions in time-complexities for logic-based studies. We have applied assay for transposase-accessible chromatin with sequencing (scATAC-seq) data and TF DNA binding motifs to filter out non-relevant TFs in gene regulations. By integrating single-cell clustering with these external cues, scGATE is able to infer context specific networks. The performance of scGATE is evaluated using synthetic and real single-cell multi-omics data from mouse tissues and human blood, demonstrating its superiority over existing tools for reconstructing TF-gene networks. Additionally, scGATE provides a flexible framework for understanding the complex combinatorial and cooperative relationships among TFs regulating target genes by inferring Boolean logic gates among them.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), 19395-5746, Tehran, Iran
| | - Laleh Haghverdi
- Berlin Institute for Medical Systems Biology, Max Delbrück Center (BIMSB-MDC) in the Helmholtz Association, Berlin, Germany
| | - Mehdi Sadeghi
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, 1497716316, Tehran, Iran
| |
Collapse
|
30
|
Balachander GM, Nilawar S, Meka SRK, Ghosh LD, Chatterjee K. Unravelling microRNA regulation and miRNA-mRNA regulatory networks in osteogenesis driven by 3D nanotopographical cues. Biomater Sci 2024; 12:978-989. [PMID: 38189225 DOI: 10.1039/d3bm01597a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Three-dimensional (3D) culturing of cells is being adopted for developing tissues for various applications such as mechanistic studies, drug testing, tissue regeneration, and animal-free meat. These approaches often involve cost-effective differentiation of stem or progenitor cells. One approach is to exploit architectural cues on a 3D substrate to drive cellular differentiation, which has been shown to be effective in various studies. Although extensive gene expression data from such studies have shown that gene expression patterns might differ, the gene regulatory networks controlling the expression of genes are rarely studied. In this study, we profiled genes and microRNAs (miRNAs) via next-generation sequencing (NGS) in human mesenchymal stem cells (hMSCs) driven toward osteogenesis via architectural cues in 3D matrices (3D conditions) and compared with cells in two-dimensional (2D) culture driven toward osteogenesis via soluble osteoinductive factors (OF conditions). The total number of differentially expressed genes was smaller in 3D compared to OF conditions. A distinct set of genes was observed under these conditions that have been shown to control osteogenic differentiation via different pathways. Small RNA sequencing revealed a core set of miRNAs to be differentially expressed under these conditions, similar to those that have been previously implicated in osteogenesis. We also observed a distinct regulation of miRNAs in these samples that can modulate gene expression, suggesting supplementary gene regulatory networks operative under different stimuli. This study provides insights into studying gene regulatory networks for identifying critical nodes to target for enhanced cellular differentiation and reveal the differences in physical and biochemical cues to drive cell fates.
Collapse
Affiliation(s)
- Gowri Manohari Balachander
- School of Biomedical Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi-221005, India.
| | - Sagar Nilawar
- Department of Materials Engineering, Indian Institute of Science, Bangalore-560012, India.
| | - Sai Rama Krishna Meka
- Department of Materials Engineering, Indian Institute of Science, Bangalore-560012, India.
| | - Lopamudra Das Ghosh
- Department of Materials Engineering, Indian Institute of Science, Bangalore-560012, India.
| | - Kaushik Chatterjee
- Department of Materials Engineering, Indian Institute of Science, Bangalore-560012, India.
| |
Collapse
|
31
|
Wu Z, Sinha S. SPREd: a simulation-supervised neural network tool for gene regulatory network reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae011. [PMID: 38444538 PMCID: PMC10913396 DOI: 10.1093/bioadv/vbae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/08/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Summary Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd," is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g. correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA, and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold-standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step toward incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction. Availability and implementation Data and code are available from https://github.com/iiiime/SPREd.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
32
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
33
|
Xu F, Hu H, Lin H, Lu J, Cheng F, Zhang J, Li X, Shuai J. scGIR: deciphering cellular heterogeneity via gene ranking in single-cell weighted gene correlation networks. Brief Bioinform 2024; 25:bbae091. [PMID: 38487851 PMCID: PMC10940817 DOI: 10.1093/bib/bbae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/08/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene correlation network to evaluate gene importance. The algorithm transforms single-cell sequencing data into a robust gene correlation network through statistical independence, with correlation edges weighted by gene expression levels. We then constructed a random walk model on the resulting weighted gene correlation network to rank the importance of genes. Our analysis of gene importance using PageRank algorithm across nine authentic scRNA-seq datasets indicates that scGIR can effectively surmount technical noise, enabling the identification of cell types and inference of developmental trajectories. We demonstrated that the edges of gene correlation, weighted by expression, play a critical role in enhancing the algorithm's performance. Our findings emphasize that scGIR outperforms in enhancing the clustering of cell subtypes, reverse identifying differentially expressed marker genes, and uncovering genes with potential differential importance. Overall, we proposed a promising method capable of extracting more information from single-cell RNA sequencing datasets, potentially shedding new lights on cellular processes and disease mechanisms.
Collapse
Affiliation(s)
- Fei Xu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou 350108, China
| | - Hai Lin
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jun Lu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- School of Medical Imageology, Wannan Medical College, Wuhu 241002, China
| | - Feng Cheng
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jiqian Zhang
- Department of Physics, Anhui Normal University, Wuhu 241002, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jianwei Shuai
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China
| |
Collapse
|
34
|
Lodi MK, Chernikov A, Ghosh P. COFFEE: Consensus Single Cell-Type Specific Inference for Gene Regulatory Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574445. [PMID: 38260386 PMCID: PMC10802453 DOI: 10.1101/2024.01.05.574445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared to individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated and experimental datasets when compared to baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| |
Collapse
|
35
|
Alanis-Lobato G, Bartlett TE, Huang Q, Simon CS, McCarthy A, Elder K, Snell P, Christie L, Niakan KK. MICA: a multi-omics method to predict gene regulatory networks in early human embryos. Life Sci Alliance 2024; 7:e202302415. [PMID: 37879938 PMCID: PMC10599980 DOI: 10.26508/lsa.202302415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 10/12/2023] [Accepted: 10/13/2023] [Indexed: 10/27/2023] Open
Abstract
Recent advances in single-cell omics have transformed characterisation of cell types in challenging-to-study biological contexts. In contexts with limited single-cell samples, such as the early human embryo inference of transcription factor-gene regulatory network (GRN) interactions is especially difficult. Here, we assessed application of different linear or non-linear GRN predictions to single-cell simulated and human embryo transcriptome datasets. We also compared how expression normalisation impacts on GRN predictions, finding that transcripts per million reads outperformed alternative methods. GRN inferences were more reproducible using a non-linear method based on mutual information (MI) applied to single-cell transcriptome datasets refined with chromatin accessibility (CA) (called MICA), compared with alternative network prediction methods tested. MICA captures complex non-monotonic dependencies and feedback loops. Using MICA, we generated the first GRN inferences in early human development. MICA predicted co-localisation of the AP-1 transcription factor subunit proto-oncogene JUND and the TFAP2C transcription factor AP-2γ in early human embryos. Overall, our comparative analysis of GRN prediction methods defines a pipeline that can be applied to single-cell multi-omics datasets in especially challenging contexts to infer interactions between transcription factor expression and target gene regulation.
Collapse
Affiliation(s)
| | | | - Qiulin Huang
- Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, London, UK
- https://ror.org/013meh722 Department of Physiology, Development and Neuroscience, The Centre for Trophoblast Research, University of Cambridge, Cambridge, UK
| | - Claire S Simon
- Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, London, UK
| | - Afshan McCarthy
- Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, London, UK
| | | | | | | | - Kathy K Niakan
- Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, London, UK
- https://ror.org/013meh722 Department of Physiology, Development and Neuroscience, The Centre for Trophoblast Research, University of Cambridge, Cambridge, UK
- https://ror.org/013meh722 Wellcome - Medical Research Council Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| |
Collapse
|
36
|
Croydon-Veleslavov IA, Stumpf MPH. Repeated Decision Stumping Distils Simple Rules from Single-Cell Data. J Comput Biol 2024; 31:21-40. [PMID: 38170180 DOI: 10.1089/cmb.2021.0613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open
Abstract
Single-cell data afford unprecedented insights into molecular processes. But the complexity and size of these data sets have proved challenging and given rise to a large armory of statistical and machine learning approaches. The majority of approaches focuses on either describing features of these data, or making predictions and classifying unlabeled samples. In this study, we introduce repeated decision stumping (ReDX) as a method to distill simple models from single-cell data. We develop decision trees of depth one-hence "stumps"-to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key players involved in these processes in an unbiased manner without prior knowledge. Our algorithm is deliberately targeting the simplest possible candidate hypotheses that can be extracted from complex high-dimensional data. There are three reasons for this: (1) the predictions become straightforwardly testable hypotheses; (2) the identified candidates form the basis for further mechanistic model development, for example, for engineering and synthetic biology interventions; and (3) this approach complements existing descriptive modeling approaches and frameworks. The approach is computationally efficient, has remarkable predictive power, including in simulation studies where the ground truth is known, and yields robust and statistically stable predictors; the same set of candidates is generated by applying the algorithm to different subsamples of experimental data.
Collapse
Affiliation(s)
- Ivan A Croydon-Veleslavov
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
- School of BioSciences, University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, University of Melbourne, Parkville, Australia
| |
Collapse
|
37
|
Kim H, Choi H, Lee D, Kim J. A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing. Genes Genomics 2024; 46:1-11. [PMID: 38032470 DOI: 10.1007/s13258-023-01473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Understanding gene regulatory networks (GRNs) is essential for unraveling the molecular mechanisms governing cellular behavior. With the advent of high-throughput transcriptome measurement technology, researchers have aimed to reverse engineer the biological systems, extracting gene regulatory rules from their outputs, which represented by gene expression data. Bulk RNA sequencing, a widely used method for measuring gene expression, has been employed for GRN reconstruction. However, it falls short in capturing dynamic changes in gene expression at the level of individual cells since it averages gene expression across mixed cell populations. OBJECTIVE In this review, we provide an overview of 15 GRN reconstruction tools and discuss their respective strengths and limitations, particularly in the context of single cell RNA sequencing (scRNA-seq). METHODS Recent advancements in scRNA-seq break new ground of GRN reconstruction. They offer snapshots of the individual cell transcriptomes and capturing dynamic changes. We emphasize how these technological breakthroughs have enhanced GRN reconstruction. CONCLUSION GRN reconstructors can be classified based on their requirement for cellular trajectory, which represents a dynamical cellular process including differentiation, aging, or disease progression. Benchmarking studies support the superiority of GRN reconstructors that do not require trajectory analysis in identifying regulator-target relationships. However, methods equipped with trajectory analysis demonstrate better performance in identifying key regulatory factors. In conclusion, researchers should select a suitable GRN reconstructor based on their specific research objectives.
Collapse
Affiliation(s)
- Hyeonkyu Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Hwisoo Choi
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Daewon Lee
- School of Art and Technology, Chung-Ang University, 4726 Seodong-Daero, Anseong-Si, Gyeonggi-Do, 17546, Republic of Korea.
| | - Junil Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea.
| |
Collapse
|
38
|
Wu Z, Sinha S. SPREd: A simulation-supervised neural network tool for gene regulatory network reconstruction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566399. [PMID: 38014297 PMCID: PMC10680606 DOI: 10.1101/2023.11.09.566399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA
| |
Collapse
|
39
|
Cingiz MÖ. k- Strong Inference Algorithm: A Hybrid Information Theory Based Gene Network Inference Algorithm. Mol Biotechnol 2023:10.1007/s12033-023-00929-2. [PMID: 37950851 DOI: 10.1007/s12033-023-00929-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/05/2023] [Indexed: 11/13/2023]
Abstract
Gene networks allow researchers to understand the underlying mechanisms between diseases and genes while reducing the need for wet lab experiments. Numerous gene network inference (GNI) algorithms have been presented in the literature to infer accurate gene networks. We proposed a hybrid GNI algorithm, k-Strong Inference Algorithm (ksia), to infer more reliable and robust gene networks from omics datasets. To increase reliability, ksia integrates Pearson correlation coefficient (PCC) and Spearman rank correlation coefficient (SCC) scores to determine mutual information scores between molecules to increase diversity of relation predictions. To infer a more robust gene network, ksia applies three different elimination steps to remove redundant and spurious relations between genes. The performance of ksia was evaluated on microbe microarrays database in the overlap analysis with other GNI algorithms, namely ARACNE, C3NET, CLR, and MRNET. Ksia inferred less number of relations due to its strict elimination steps. However, ksia generally performed better on Escherichia coli (E.coli) and Saccharomyces cerevisiae (yeast) gene expression datasets due to F- measure and precision values. The integration of association estimator scores and three elimination stages slightly increases the performance of ksia based gene networks. Users can access ksia R package and user manual of package via https://github.com/ozgurcingiz/ksia .
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Bursa Technical University, Mimar Sinan Campus, Yildirim, 16310, Bursa, Turkey.
| |
Collapse
|
40
|
Paas-Oliveros E, Hernández-Lemus E, de Anda-Jáuregui G. Computational single cell oncology: state of the art. Front Genet 2023; 14:1256991. [PMID: 38028624 PMCID: PMC10663273 DOI: 10.3389/fgene.2023.1256991] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Single cell computational analysis has emerged as a powerful tool in the field of oncology, enabling researchers to decipher the complex cellular heterogeneity that characterizes cancer. By leveraging computational algorithms and bioinformatics approaches, this methodology provides insights into the underlying genetic, epigenetic and transcriptomic variations among individual cancer cells. In this paper, we present a comprehensive overview of single cell computational analysis in oncology, discussing the key computational techniques employed for data processing, analysis, and interpretation. We explore the challenges associated with single cell data, including data quality control, normalization, dimensionality reduction, clustering, and trajectory inference. Furthermore, we highlight the applications of single cell computational analysis, including the identification of novel cell states, the characterization of tumor subtypes, the discovery of biomarkers, and the prediction of therapy response. Finally, we address the future directions and potential advancements in the field, including the development of machine learning and deep learning approaches for single cell analysis. Overall, this paper aims to provide a roadmap for researchers interested in leveraging computational methods to unlock the full potential of single cell analysis in understanding cancer biology with the goal of advancing precision oncology. For this purpose, we also include a notebook that instructs on how to apply the recommended tools in the Preprocessing and Quality Control section.
Collapse
Affiliation(s)
- Ernesto Paas-Oliveros
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Investigadores por Mexico, Conahcyt, Mexico City, Mexico
| |
Collapse
|
41
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
42
|
Jin Q, Greenstein JL, Winslow RL. Estimating the probability of early afterdepolarizations and predicting arrhythmic risk associated with long QT syndrome type 1 mutations. Biophys J 2023; 122:4042-4056. [PMID: 37705243 PMCID: PMC10598291 DOI: 10.1016/j.bpj.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/29/2023] [Accepted: 09/08/2023] [Indexed: 09/15/2023] Open
Abstract
Early afterdepolarizations (EADs) are action potential (AP) repolarization abnormalities that can trigger lethal arrhythmias. Simulations using biophysically detailed cardiac myocyte models can reveal how model parameters influence the probability of these cellular arrhythmias; however, such analyses can pose a huge computational burden. We have previously developed a highly simplified approach in which logistic regression models (LRMs) map parameters of complex cell models to the probability of ectopic beats. Here, we extend this approach to predict the probability of EADs (P(EAD)) as a mechanistic metric of arrhythmic risk. We use the LRM to investigate how changes in parameters of the slow-activating delayed rectifier current (IKs) affect P(EAD) for 17 different long QT syndrome type 1 (LQTS1) mutations. In this LQTS1 clinical arrhythmic risk prediction task, we compared P(EAD) for these 17 mutations with two other recently published model-based arrhythmia risk metrics (AP morphology metric across populations of myocyte models and transmural repolarization prolongation based on a one-dimensional [1D] tissue-level model). These model-based risk metrics yield similar prediction performance; however, each fails to stratify clinical risk for a significant number of the 17 studied LQTS1 mutations. Nevertheless, an interpretable ensemble model using multivariate linear regression built by combining all of these model-based risk metrics successfully predicts the clinical risk of 17 mutations. These results illustrate the potential of computational approaches in arrhythmia risk prediction.
Collapse
Affiliation(s)
- Qingchu Jin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Joseph L Greenstein
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Raimond L Winslow
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.
| |
Collapse
|
43
|
Shi Q, Chen X, Zhang Z. Decoding Human Biology and Disease Using Single-cell Omics Technologies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:926-949. [PMID: 37739168 PMCID: PMC10928380 DOI: 10.1016/j.gpb.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 05/22/2023] [Accepted: 06/08/2023] [Indexed: 09/24/2023]
Abstract
Over the past decade, advances in single-cell omics (SCO) technologies have enabled the investigation of cellular heterogeneity at an unprecedented resolution and scale, opening a new avenue for understanding human biology and disease. In this review, we summarize the developments of sequencing-based SCO technologies and computational methods, and focus on considerable insights acquired from SCO sequencing studies to understand normal and diseased properties, with a particular emphasis on cancer research. We also discuss the technological improvements of SCO and its possible contribution to fundamental research of the human, as well as its great potential in clinical diagnoses and personalized therapies of human disease.
Collapse
Affiliation(s)
- Qiang Shi
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Xueyan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; Changping Laboratory, Beijing 102206, China.
| |
Collapse
|
44
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
45
|
Zeng Y, He Y, Zheng R, Li M. Inferring single-cell gene regulatory network by non-redundant mutual information. Brief Bioinform 2023; 24:bbad326. [PMID: 37715282 DOI: 10.1093/bib/bbad326] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/12/2023] [Accepted: 08/08/2023] [Indexed: 09/17/2023] Open
Abstract
Gene regulatory network plays a crucial role in controlling the biological processes of living creatures. Deciphering the complex gene regulatory networks from experimental data remains a major challenge in system biology. Recent advances in single-cell RNA sequencing technology bring massive high-resolution data, enabling computational inference of cell-specific gene regulatory networks (GRNs). Many relevant algorithms have been developed to achieve this goal in the past years. However, GRN inference is still less ideal due to the extra noises involved in pseudo-time information and large amounts of dropouts in datasets. Here, we present a novel GRN inference method named Normi, which is based on non-redundant mutual information. Normi manipulates these problems by employing a sliding size-fixed window approach on the entire trajectory and conducts average smoothing strategy on the gene expression of the cells in each window to obtain representative cells. To further alleviate the impact of dropouts, we utilize the mixed KSG estimator to quantify the high-order time-delayed mutual information among genes, then filter out the redundant edges by adopting Max-Relevance and Min Redundancy algorithm. Moreover, we determined the optimal time delay for each gene pair by distance correlation. Normi outperforms other state-of-the-art GRN inference methods on both simulated data and single-cell RNA sequencing (scRNA-seq) datasets, demonstrating its superiority in robustness. The performance of Normi in real scRNA-seq data further reveals its ability to identify the key regulators and crucial biological processes.
Collapse
Affiliation(s)
- Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongxin He
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
46
|
Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, hile also incorporating technical noises. Moreover, it allows users to adjust each factor's effect easily. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking a wide range of computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data, many of them were not benchmarked before due to the lack of proper tools. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
Collapse
Affiliation(s)
- Hechen Li
- Georgia Institute of Technology, Atlanta, USA
| | - Ziqi Zhang
- Georgia Institute of Technology, Atlanta, USA
| | | | - Xi Chen
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|
47
|
Groves SM, Quaranta V. Quantifying cancer cell plasticity with gene regulatory networks and single-cell dynamics. FRONTIERS IN NETWORK PHYSIOLOGY 2023; 3:1225736. [PMID: 37731743 PMCID: PMC10507267 DOI: 10.3389/fnetp.2023.1225736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/25/2023] [Indexed: 09/22/2023]
Abstract
Phenotypic plasticity of cancer cells can lead to complex cell state dynamics during tumor progression and acquired resistance. Highly plastic stem-like states may be inherently drug-resistant. Moreover, cell state dynamics in response to therapy allow a tumor to evade treatment. In both scenarios, quantifying plasticity is essential for identifying high-plasticity states or elucidating transition paths between states. Currently, methods to quantify plasticity tend to focus on 1) quantification of quasi-potential based on the underlying gene regulatory network dynamics of the system; or 2) inference of cell potency based on trajectory inference or lineage tracing in single-cell dynamics. Here, we explore both of these approaches and associated computational tools. We then discuss implications of each approach to plasticity metrics, and relevance to cancer treatment strategies.
Collapse
Affiliation(s)
- Sarah M. Groves
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States
| | - Vito Quaranta
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States
- Department of Biochemistry, Vanderbilt University, Nashville, TN, United States
| |
Collapse
|
48
|
Zilbauer M, James KR, Kaur M, Pott S, Li Z, Burger A, Thiagarajah JR, Burclaff J, Jahnsen FL, Perrone F, Ross AD, Matteoli G, Stakenborg N, Sujino T, Moor A, Bartolome-Casado R, Bækkevold ES, Zhou R, Xie B, Lau KS, Din S, Magness ST, Yao Q, Beyaz S, Arends M, Denadai-Souza A, Coburn LA, Gaublomme JT, Baldock R, Papatheodorou I, Ordovas-Montanes J, Boeckxstaens G, Hupalowska A, Teichmann SA, Regev A, Xavier RJ, Simmons A, Snyder MP, Wilson KT. A Roadmap for the Human Gut Cell Atlas. Nat Rev Gastroenterol Hepatol 2023; 20:597-614. [PMID: 37258747 PMCID: PMC10527367 DOI: 10.1038/s41575-023-00784-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/14/2023] [Indexed: 06/02/2023]
Abstract
The number of studies investigating the human gastrointestinal tract using various single-cell profiling methods has increased substantially in the past few years. Although this increase provides a unique opportunity for the generation of the first comprehensive Human Gut Cell Atlas (HGCA), there remains a range of major challenges ahead. Above all, the ultimate success will largely depend on a structured and coordinated approach that aligns global efforts undertaken by a large number of research groups. In this Roadmap, we discuss a comprehensive forward-thinking direction for the generation of the HGCA on behalf of the Gut Biological Network of the Human Cell Atlas. Based on the consensus opinion of experts from across the globe, we outline the main requirements for the first complete HGCA by summarizing existing data sets and highlighting anatomical regions and/or tissues with limited coverage. We provide recommendations for future studies and discuss key methodologies and the importance of integrating the healthy gut atlas with related diseases and gut organoids. Importantly, we critically overview the computational tools available and provide recommendations to overcome key challenges.
Collapse
Affiliation(s)
- Matthias Zilbauer
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.
- University Department of Paediatrics, University of Cambridge, Cambridge, UK.
- Department of Paediatric Gastroenterology, Hepatology and Nutrition, Cambridge University Hospitals, Cambridge, UK.
| | - Kylie R James
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- School of Biomedical Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Mandeep Kaur
- School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, South Africa
| | - Sebastian Pott
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Zhixin Li
- Dana-Farber Cancer Institute, Boston, MA, USA
| | - Albert Burger
- Department of Computer Science, Heriot-watt University, Edinburgh, UK
| | - Jay R Thiagarajah
- Division of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Joseph Burclaff
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University', Chapel Hill, NC, USA
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Frode L Jahnsen
- Department of Pathology, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Francesca Perrone
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- University Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Alexander D Ross
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- University Department of Paediatrics, University of Cambridge, Cambridge, UK
- University Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Gianluca Matteoli
- Translational Research Center for Gastrointestinal Disorders (TARGID), Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium
| | - Nathalie Stakenborg
- Translational Research Center for Gastrointestinal Disorders (TARGID), Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium
| | - Tomohisa Sujino
- Center for the Diagnostic and Therapeutic Endoscopy, School of Medicine, Keio University, Tokyo, Japan
| | - Andreas Moor
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Raquel Bartolome-Casado
- Department of Pathology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Wellcome Sanger Institute, Hinxton, UK
| | - Espen S Bækkevold
- Department of Pathology, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ran Zhou
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Bingqing Xie
- Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Ken S Lau
- Epithelial Biology Center and Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Shahida Din
- Edinburgh IBD Unit, Western General Hospital, NHS Lothian, Edinburgh, UK
| | - Scott T Magness
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University', Chapel Hill, NC, USA
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Qiuming Yao
- Department of Computer Science and Engineering, University of Nebraska Lincoln, Lincoln, NE, USA
| | - Semir Beyaz
- Cold Spring Harbour Laboratory, Cold Spring Harbour, New York, NY, USA
| | - Mark Arends
- Division of Pathology, Centre for Comparative Pathology, Cancer Research UK Edinburgh Centre, Institute of Cancer and Genetics, University of Edinburgh, Edinburgh, UK
| | - Alexandre Denadai-Souza
- Laboratory of Mucosal Biology, Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium
| | - Lori A Coburn
- Vanderbilt University Medical Center, Nashville, TN, USA
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN, USA
| | | | | | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Jose Ordovas-Montanes
- Division of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Guy Boeckxstaens
- Translational Research Center for Gastrointestinal Disorders (TARGID), Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium
| | | | - Sarah A Teichmann
- Wellcome Sanger Institute, Hinxton, UK
- Theory of Condensed Matter Group, Cavendish Laboratory/Department of Physics, University of Cambridge, Cambridge, UK
| | - Aviv Regev
- Genentech, San Francisco, CA, USA
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Ramnik J Xavier
- Broad Institute and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Alison Simmons
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | | | - Keith T Wilson
- Vanderbilt University Medical Center, Nashville, TN, USA
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN, USA
| |
Collapse
|
49
|
Xue L, Wu Y, Lin Y. Dissecting and improving gene regulatory network inference using single-cell transcriptome data. Genome Res 2023; 33:1609-1621. [PMID: 37580132 PMCID: PMC10620053 DOI: 10.1101/gr.277488.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023]
Abstract
Single-cell transcriptome data has been widely used to reconstruct gene regulatory networks (GRNs) controlling critical biological processes such as development and differentiation. Although a growing list of algorithms has been developed to infer GRNs using such data, achieving an inference accuracy consistently higher than random guessing has remained challenging. To address this, it is essential to delineate how the accuracy of regulatory inference is limited. Here, we systematically characterized factors limiting the accuracy of inferred GRNs and demonstrated that using pre-mRNA information can help improve regulatory inference compared to the typically used information (i.e., mature mRNA). Using kinetic modeling and simulated single-cell data sets, we showed that target genes' mature mRNA levels often fail to accurately report upstream regulatory activities because of gene-level and network-level factors, which can be improved by using pre-mRNA levels. We tested this finding on public single-cell RNA-seq data sets using intronic reads as proxies of pre-mRNA levels and can indeed achieve a higher inference accuracy compared to using exonic reads (corresponding to mature mRNAs). Using experimental data sets, we further validated findings from the simulated data sets and identified factors such as transcription factor activity dynamics influencing the accuracy of pre-mRNA-based inference. This work delineates the fundamental limitations of gene regulatory inference and helps improve GRN inference using single-cell RNA-seq data.
Collapse
Affiliation(s)
- Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| | - Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871;
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| |
Collapse
|
50
|
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet 2023; 19:e1010942. [PMID: 37703293 PMCID: PMC10519590 DOI: 10.1371/journal.pgen.1010942] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/25/2023] [Accepted: 08/29/2023] [Indexed: 09/15/2023] Open
Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|