1
|
Musilova J, Vafek Z, Puniya BL, Zimmer R, Helikar T, Sedlar K. Augusta: From RNA-Seq to gene regulatory networks and Boolean models. Comput Struct Biotechnol J 2024; 23:783-790. [PMID: 38312198 PMCID: PMC10837063 DOI: 10.1016/j.csbj.2024.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Computational models of gene regulations help to understand regulatory mechanisms and are extensively used in a wide range of areas, e.g., biotechnology or medicine, with significant benefits. Unfortunately, there are only a few computational gene regulatory models of whole genomes allowing static and dynamic analysis due to the lack of sophisticated tools for their reconstruction. Here, we describe Augusta, an open-source Python package for Gene Regulatory Network (GRN) and Boolean Network (BN) inference from the high-throughput gene expression data. Augusta can reconstruct genome-wide models suitable for static and dynamic analyses. Augusta uses a unique approach where the first estimation of a GRN inferred from expression data is further refined by predicting transcription factor binding motifs in promoters of regulated genes and by incorporating verified interactions obtained from databases. Moreover, a refined GRN is transformed into a draft BN by searching in the curated model database and setting logical rules to incoming edges of target genes, which can be further manually edited as the model is provided in the SBML file format. The approach is applicable even if information about the organism under study is not available in the databases, which is typically the case for non-model organisms including most microbes. Augusta can be operated from the command line and, thus, is easy to use for automated prediction of models for various genomes. The Augusta package is freely available at github.com/JanaMus/Augusta. Documentation and tutorials are available at augusta.readthedocs.io.
Collapse
Affiliation(s)
- Jana Musilova
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Zdenek Vafek
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
- Institute of Forensic Engineering, Brno University of Technology, Brno 61200, Czech Republic
| | - Bhanwar Lal Puniya
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| | - Tomas Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Karel Sedlar
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| |
Collapse
|
2
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
3
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
4
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
5
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Ioannis Mouratidis
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Nikol Chantzi
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Yasin Uzun
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Ilias Georgakopoulos-Soares
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
6
|
Lee Y, Xu Y, Gao P, Chen J. TENET: Triple-enhancement based graph neural network for cell-cell interaction network reconstruction from spatial transcriptomics. J Mol Biol 2024; 436:168543. [PMID: 38508302 DOI: 10.1016/j.jmb.2024.168543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/03/2024] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Cellular communication relies on the intricate interplay of signaling molecules, forming the Cell-cell Interaction network (CCI) that coordinates tissue behavior. Researchers have shown the capability of shallow neural networks in reconstructing CCI, given molecules' abundance in the Spatial Transcriptomics (ST) data. When encountering situations such as sparse connections in CCI and excessive noise, the susceptibility of shallow networks to these factors significantly impacts the accuracy of CCI reconstruction, resulting in subpar results. To reconstruct a more comprehensive and accurate CCI, we propose a novel method named Triple-Enhancement based Graph Neural Network (TENET). In TENET, three progressive enhancement mechanisms build upon each other, creating a cumulative effect. This approach can ensure the ability to capture valuable features in limited data and amplify the noise signal to facilitate the denoising effect. Additionally, the whole architecture guides the decoding reconstruction phase with integrated knowledge, which leverages the accumulated insights from each stage of enhancement to ensure a refined and comprehensive CCI reconstruction. The presented TENET has been implemented and tested on both real and synthetic ST datasets. Averagely, the CCI reconstruction using TENET achieves a 9.61% improvement in Average Precision (AP) and a 7.32% improvement in Area Under the Receiver Operating Characteristic (AUROC) compared to the existing state-of-the-art (SOTA) method. The source code and data are available at https://github.com/Yujian-Lee/TENET.
Collapse
Affiliation(s)
- Yujian Lee
- Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China
| | - Yongqi Xu
- Department of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Peng Gao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China
| | - Jiaxing Chen
- Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China.
| |
Collapse
|
7
|
Wang Y, Chen X, Zheng Z, Huang L, Xie W, Wang F, Zhang Z, Wong KC. scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics. iScience 2024; 27:109352. [PMID: 38510148 PMCID: PMC10951644 DOI: 10.1016/j.isci.2024.109352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/29/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.
Collapse
Affiliation(s)
- Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
8
|
Skok Gibbs C, Mahmood O, Bonneau R, Cho K. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization. Genome Biol 2024; 25:88. [PMID: 38589899 PMCID: PMC11003171 DOI: 10.1186/s13059-024-03226-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.
Collapse
Affiliation(s)
| | - Omar Mahmood
- Center for Data Science, New York University, New York, NY, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, 10011, USA
- Prescient Design, Genentech, New York, NY, 10010, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, 10011, USA.
- Prescient Design, Genentech, New York, NY, 10010, USA.
| |
Collapse
|
9
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578316. [PMID: 38562775 PMCID: PMC10983863 DOI: 10.1101/2024.02.01.578316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement. Key points An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
Collapse
|
10
|
Ishikawa M, Sugino S, Masuda Y, Tarumoto Y, Seto Y, Taniyama N, Wagai F, Yamauchi Y, Kojima Y, Kiryu H, Yusa K, Eiraku M, Mochizuki A. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun Biol 2023; 6:1290. [PMID: 38155269 PMCID: PMC10754834 DOI: 10.1038/s42003-023-05594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 11/15/2023] [Indexed: 12/30/2023] Open
Abstract
Single-cell RNA-seq analysis coupled with CRISPR-based perturbation has enabled the inference of gene regulatory networks with causal relationships. However, a snapshot of single-cell CRISPR data may not lead to an accurate inference, since a gene knockout can influence multi-layered downstream over time. Here, we developed RENGE, a computational method that infers gene regulatory networks using a time-series single-cell CRISPR dataset. RENGE models the propagation process of the effects elicited by a gene knockout on its regulatory network. It can distinguish between direct and indirect regulations, which allows for the inference of regulations by genes that are not knocked out. RENGE therefore outperforms current methods in the accuracy of inferring gene regulatory networks. When used on a dataset we derived from human-induced pluripotent stem cells, RENGE yielded a network consistent with multiple databases and literature. Accurate inference of gene regulatory networks by RENGE would enable the identification of key factors for various biological systems.
Collapse
Affiliation(s)
- Masato Ishikawa
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan.
| | - Seiichi Sugino
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yoshie Masuda
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yusuke Tarumoto
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yusuke Seto
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Nobuko Taniyama
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Fumi Wagai
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yuhei Yamauchi
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Yasuhiro Kojima
- Laboratory of Computational Life Science, National Cancer Center Research Institute, Tokyo, 104-0045, Japan
| | - Hisanori Kiryu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
| | - Kosuke Yusa
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| | - Mototsugu Eiraku
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8507, Japan
| | - Atsushi Mochizuki
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, 606-8507, Japan
| |
Collapse
|
11
|
Cingiz MÖ. k- Strong Inference Algorithm: A Hybrid Information Theory Based Gene Network Inference Algorithm. Mol Biotechnol 2023:10.1007/s12033-023-00929-2. [PMID: 37950851 DOI: 10.1007/s12033-023-00929-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/05/2023] [Indexed: 11/13/2023]
Abstract
Gene networks allow researchers to understand the underlying mechanisms between diseases and genes while reducing the need for wet lab experiments. Numerous gene network inference (GNI) algorithms have been presented in the literature to infer accurate gene networks. We proposed a hybrid GNI algorithm, k-Strong Inference Algorithm (ksia), to infer more reliable and robust gene networks from omics datasets. To increase reliability, ksia integrates Pearson correlation coefficient (PCC) and Spearman rank correlation coefficient (SCC) scores to determine mutual information scores between molecules to increase diversity of relation predictions. To infer a more robust gene network, ksia applies three different elimination steps to remove redundant and spurious relations between genes. The performance of ksia was evaluated on microbe microarrays database in the overlap analysis with other GNI algorithms, namely ARACNE, C3NET, CLR, and MRNET. Ksia inferred less number of relations due to its strict elimination steps. However, ksia generally performed better on Escherichia coli (E.coli) and Saccharomyces cerevisiae (yeast) gene expression datasets due to F- measure and precision values. The integration of association estimator scores and three elimination stages slightly increases the performance of ksia based gene networks. Users can access ksia R package and user manual of package via https://github.com/ozgurcingiz/ksia .
Collapse
Affiliation(s)
- Mustafa Özgür Cingiz
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Bursa Technical University, Mimar Sinan Campus, Yildirim, 16310, Bursa, Turkey.
| |
Collapse
|
12
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
13
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
14
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
15
|
Massonis G, Villaverde AF, Banga JR. Distilling identifiable and interpretable dynamic models from biological data. PLoS Comput Biol 2023; 19:e1011014. [PMID: 37851682 PMCID: PMC10615316 DOI: 10.1371/journal.pcbi.1011014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 10/30/2023] [Accepted: 10/03/2023] [Indexed: 10/20/2023] Open
Abstract
Mechanistic dynamical models allow us to study the behavior of complex biological systems. They can provide an objective and quantitative understanding that would be difficult to achieve through other means. However, the systematic development of these models is a non-trivial exercise and an open problem in computational biology. Currently, many research efforts are focused on model discovery, i.e. automating the development of interpretable models from data. One of the main frameworks is sparse regression, where the sparse identification of nonlinear dynamics (SINDy) algorithm and its variants have enjoyed great success. SINDy-PI is an extension which allows the discovery of rational nonlinear terms, thus enabling the identification of kinetic functions common in biochemical networks, such as Michaelis-Menten. SINDy-PI also pays special attention to the recovery of parsimonious models (Occam's razor). Here we focus on biological models composed of sets of deterministic nonlinear ordinary differential equations. We present a methodology that, combined with SINDy-PI, allows the automatic discovery of structurally identifiable and observable models which are also mechanistically interpretable. The lack of structural identifiability and observability makes it impossible to uniquely infer parameter and state variables, which can compromise the usefulness of a model by distorting its mechanistic significance and hampering its ability to produce biological insights. We illustrate the performance of our method with six case studies. We find that, despite enforcing sparsity, SINDy-PI sometimes yields models that are unidentifiable. In these cases we show how our method transforms their equations in order to obtain a structurally identifiable and observable model which is also interpretable.
Collapse
Affiliation(s)
- Gemma Massonis
- Computational Biology Lab, MBG-CSIC (Spanish National Research Council), Pontevedra, Galicia, Spain
| | - Alejandro F. Villaverde
- CITMAga, Santiago de Compostela, Galicia, Spain
- Universidade de Vigo, Department of Systems and Control Engineering, Vigo, Galicia, Spain
| | - Julio R. Banga
- Computational Biology Lab, MBG-CSIC (Spanish National Research Council), Pontevedra, Galicia, Spain
| |
Collapse
|
16
|
Dautle M, Zhang S, Chen Y. scTIGER: A Deep-Learning Method for Inferring Gene Regulatory Networks from Case versus Control scRNA-seq Datasets. Int J Mol Sci 2023; 24:13339. [PMID: 37686146 PMCID: PMC10488287 DOI: 10.3390/ijms241713339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/06/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data is an important computational question to find regulatory mechanisms involved in fundamental cellular processes. Although many computational methods have been designed to predict GRNs from scRNA-seq data, they usually have high false positive rates and none infer GRNs by directly using the paired datasets of case-versus-control experiments. Here we present a novel deep-learning-based method, named scTIGER, for GRN detection by using the co-differential relationships of gene expression profiles in paired scRNA-seq datasets. scTIGER employs cell-type-based pseudotiming, an attention-based convolutional neural network method and permutation-based significance testing for inferring GRNs among gene modules. As state-of-the-art applications, we first applied scTIGER to scRNA-seq datasets of prostate cancer cells, and successfully identified the dynamic regulatory networks of AR, ERG, PTEN and ATF3 for same-cell type between prostatic cancerous and normal conditions, and two-cell types within the prostatic cancerous environment. We then applied scTIGER to scRNA-seq data from neurons with and without fear memory and detected specific regulatory networks for BDNF, CREB1 and MAPK4. Additionally, scTIGER demonstrates robustness against high levels of dropout noise in scRNA-seq data.
Collapse
Affiliation(s)
- Madison Dautle
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA;
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA;
| |
Collapse
|
17
|
Li R, Rozum JC, Quail MM, Qasim MN, Sindi SS, Nobile CJ, Albert R, Hernday AD. Inferring gene regulatory networks using transcriptional profiles as dynamical attractors. PLoS Comput Biol 2023; 19:e1010991. [PMID: 37607190 PMCID: PMC10473541 DOI: 10.1371/journal.pcbi.1010991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/01/2023] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to "static" transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
Collapse
Affiliation(s)
- Ruihao Li
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Jordan C. Rozum
- Department of Systems Science and Industrial Engineering, Binghamton University (State University of New York), Binghamton, New York, United States of America
| | - Morgan M. Quail
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Mohammad N. Qasim
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, California, United States of America
| | - Clarissa J. Nobile
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
- Department of Biology, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
| | - Aaron D. Hernday
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| |
Collapse
|
18
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
19
|
Fang Z, Ford AJ, Hu T, Zhang N, Mantalaris A, Coskun AF. Subcellular spatially resolved gene neighborhood networks in single cells. CELL REPORTS METHODS 2023; 3:100476. [PMID: 37323566 PMCID: PMC10261906 DOI: 10.1016/j.crmeth.2023.100476] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 02/18/2023] [Accepted: 04/18/2023] [Indexed: 06/17/2023]
Abstract
Image-based spatial omics methods such as fluorescence in situ hybridization (FISH) generate molecular profiles of single cells at single-molecule resolution. Current spatial transcriptomics methods focus on the distribution of single genes. However, the spatial proximity of RNA transcripts can play an important role in cellular function. We demonstrate a spatially resolved gene neighborhood network (spaGNN) pipeline for the analysis of subcellular gene proximity relationships. In spaGNN, machine-learning-based clustering of subcellular spatial transcriptomics data yields subcellular density classes of multiplexed transcript features. The nearest-neighbor analysis produces heterogeneous gene proximity maps in distinct subcellular regions. We illustrate the cell-type-distinguishing capability of spaGNN using multiplexed error-robust FISH data of fibroblast and U2-OS cells and sequential FISH data of mesenchymal stem cells (MSCs), revealing tissue-source-specific MSC transcriptomics and spatial distribution characteristics. Overall, the spaGNN approach expands the spatial features that can be used for cell-type classification tasks.
Collapse
Affiliation(s)
- Zhou Fang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Machine Learning Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Adam J. Ford
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Nicholas Zhang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Athanasios Mantalaris
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Ahmet F. Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
- Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
20
|
Griffin AT, Vlahos LJ, Chiuzan C, Califano A. NaRnEA: An Information Theoretic Framework for Gene Set Analysis. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25030542. [PMID: 36981431 PMCID: PMC10048242 DOI: 10.3390/e25030542] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/03/2023] [Accepted: 03/13/2023] [Indexed: 05/26/2023]
Abstract
Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein's transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.
Collapse
Affiliation(s)
- Aaron T. Griffin
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Lukas J. Vlahos
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Codruta Chiuzan
- Department of Biostatistics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| |
Collapse
|
21
|
Smits JG, Arts JA, Frölich S, Snabel RR, Heuts BM, Martens JH, van Heeringen SJ, Zhou H. scANANSE gene regulatory network and motif analysis of single-cell clusters. F1000Res 2023; 12:243. [PMID: 38116584 PMCID: PMC10728588 DOI: 10.12688/f1000research.130530.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/03/2023] [Indexed: 12/21/2023] Open
Abstract
The recent development of single-cell techniques is essential to unravel complex biological systems. By measuring the transcriptome and the accessible genome on a single-cell level, cellular heterogeneity in a biological environment can be deciphered. Transcription factors act as key regulators activating and repressing downstream target genes, and together they constitute gene regulatory networks that govern cell morphology and identity. Dissecting these gene regulatory networks is crucial for understanding molecular mechanisms and disease, especially within highly complex biological systems. The gene regulatory network analysis software ANANSE and the motif enrichment software GimmeMotifs were both developed to analyse bulk datasets. We developed scANANSE, a software pipeline for gene regulatory network analysis and motif enrichment using single-cell RNA and ATAC datasets. The scANANSE pipeline can be run from either R or Python. First, it exports data from standard single-cell objects. Next, it automatically runs multiple comparisons of cell cluster data. Finally, it imports the results back to the single-cell object, where the result can be further visualised, integrated, and interpreted. Here, we demonstrate our scANANSE pipeline on a publicly available PBMC multi-omics dataset. It identifies well-known cell type-specific hematopoietic factors. Importantly, we also demonstrated that scANANSE combined with GimmeMotifs is able to predict transcription factors with both activating and repressing roles in gene regulation.
Collapse
Affiliation(s)
- Jos G.A. Smits
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Julian A. Arts
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Siebren Frölich
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Rebecca R. Snabel
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Branco M.H. Heuts
- Molecular Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Joost H.A. Martens
- Molecular Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Simon J. van Heeringen
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Huiqing Zhou
- Molecular Developmental Biology, Radboud University, Nijmegen, Gelderland, The Netherlands
- Human Genetics, Radboud University Medical Centre, Nijmegen, Gelderland, The Netherlands
| |
Collapse
|
22
|
Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
|
23
|
Costa MDOCE, do Nascimento APB, Martins YC, dos Santos MT, Figueiredo AMDS, Perez-Rueda E, Nicolás MF. The gene regulatory network of Staphylococcus aureus ST239-SCC mecIII strain Bmb9393 and assessment of genes associated with the biofilm in diverse backgrounds. Front Microbiol 2023; 13:1049819. [PMID: 36704545 PMCID: PMC9871828 DOI: 10.3389/fmicb.2022.1049819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 12/19/2022] [Indexed: 01/12/2023] Open
Abstract
Introduction Staphylococcus aureus is one of the most prevalent and relevant pathogens responsible for a wide spectrum of hospital-associated or community-acquired infections. In addition, methicillin-resistant Staphylococcus aureus may display multidrug resistance profiles that complicate treatment and increase the mortality rate. The ability to produce biofilm, particularly in device-associated infections, promotes chronic and potentially more severe infections originating from the primary site. Understanding the complex mechanisms involved in planktonic and biofilm growth is critical to identifying regulatory connections and ways to overcome the global health problem of multidrug-resistant bacteria. Methods In this work, we apply literature-based and comparative genomics approaches to reconstruct the gene regulatory network of the high biofilm-producing strain Bmb9393, belonging to one of the highly disseminating successful clones, the Brazilian epidemic clone. To the best of our knowledge, we describe for the first time the topological properties and network motifs for the Staphylococcus aureus pathogen. We performed this analysis using the ST239-SCCmecIII Bmb9393 strain. In addition, we analyzed transcriptomes available in the literature to construct a set of genes differentially expressed in the biofilm, covering different stages of the biofilms and genetic backgrounds of the strains. Results and discussion The Bmb9393 gene regulatory network comprises 1,803 regulatory interactions between 64 transcription factors and the non-redundant set of 1,151 target genes with the inclusion of 19 new regulons compared to the N315 transcriptional regulatory network published in 2011. In the Bmb9393 network, we found 54 feed-forward loop motifs, where the most prevalent were coherent type 2 and incoherent type 2. The non-redundant set of differentially expressed genes in the biofilm consisted of 1,794 genes with functional categories relevant for adaptation to the variable microenvironments established throughout the biofilm formation process. Finally, we mapped the set of genes with altered expression in the biofilm in the Bmb9393 gene regulatory network to depict how different growth modes can alter the regulatory systems. The data revealed 45 transcription factors and 876 shared target genes. Thus, the gene regulatory network model provided represents the most up-to-date model for Staphylococcus aureus, and the set of genes altered in the biofilm provides a global view of their influence on biofilm formation from distinct experimental perspectives and different strain backgrounds.
Collapse
Affiliation(s)
| | - Ana Paula Barbosa do Nascimento
- Departamento de Análises Clínicas e Toxicológicas, Faculdade de Ciências Farmacêuticas, Universidade de São Paulo, São Paulo, Brazil
| | | | | | - Agnes Marie de Sá Figueiredo
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Merida, Mexico
| | - Ernesto Perez-Rueda
- Laboratório de Biologia Molecular de Bactérias, Instituto de Microbiologia Paulo de Goés, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil,*Correspondence: Ernesto Perez-Rueda ✉
| | - Marisa Fabiana Nicolás
- Laboratório Nacional de Computação Científica (LNCC), Petrópolis, Brazil,Marisa Fabiana Nicolás ✉
| |
Collapse
|
24
|
Single-Cell Sequencing Identifies Master Regulators Affected by Panobinostat in Neuroblastoma Cells. Genes (Basel) 2022; 13:genes13122240. [PMID: 36553506 PMCID: PMC9778475 DOI: 10.3390/genes13122240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 11/17/2022] [Accepted: 11/28/2022] [Indexed: 12/03/2022] Open
Abstract
The molecular mechanisms and gene regulatory networks sustaining cell proliferation in neuroblastoma (NBL) cells are still not fully understood. In this tumor context, it has been proposed that anti-proliferative drugs, such as the pan-HDAC inhibitor panobinostat, could be tested to mitigate tumor progression. Here, we set out to investigate the effects of panobinostat treatment at the unprecedented resolution offered by single-cell sequencing. We identified a global senescence signature paired with reduction in proliferation in treated Kelly cells and more isolated transcriptional responses compatible with early neuronal differentiation. Using master regulator analysis, we identified BAZ1A, HCFC1, MAZ, and ZNF146 as the transcriptional regulators most significantly repressed by panobinostat. Experimental silencing of these transcription factors (TFs) confirmed their role in sustaining NBL cell proliferation in vitro.
Collapse
|
25
|
Mercatelli D, Cabrelle C, Veltri P, Giorgi FM, Guzzi PH. Detection of pan-cancer surface protein biomarkers via a network-based approach on transcriptomics data. Brief Bioinform 2022; 23:6695270. [DOI: 10.1093/bib/bbac400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/28/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Cell surface proteins have been used as diagnostic and prognostic markers in cancer research and as targets for the development of anticancer agents. Many of these proteins lie at the top of signaling cascades regulating cell responses and gene expression, therefore acting as ‘signaling hubs’. It has been previously demonstrated that the integrated network analysis on transcriptomic data is able to infer cell surface protein activity in breast cancer. Such an approach has been implemented in a publicly available method called ‘SURFACER’. SURFACER implements a network-based analysis of transcriptomic data focusing on the overall activity of curated surface proteins, with the final aim to identify those proteins driving major phenotypic changes at a network level, named surface signaling hubs. Here, we show the ability of SURFACER to discover relevant knowledge within and across cancer datasets. We also show how different cancers can be stratified in surface-activity-specific groups. Our strategy may identify cancer-wide markers to design targeted therapies and biomarker-based diagnostic approaches.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Chiara Cabrelle
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Pierangelo Veltri
- Department of Surgical and Medical Sciences, Magna Graecia University , 88100 Catanzaro , Italy
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Pietro H Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University , 88100 Catanzaro , Italy
| |
Collapse
|
26
|
Chaudhuri S, Srivastava A. Network approach to understand biological systems: From single to multilayer networks. J Biosci 2022. [DOI: 10.1007/s12038-022-00285-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Galán-Vásquez E, Gómez-García MDC, Pérez-Rueda E. A landscape of gene regulation in the parasitic amoebozoa Entamoeba spp. PLoS One 2022; 17:e0271640. [PMID: 35913975 PMCID: PMC9342746 DOI: 10.1371/journal.pone.0271640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/05/2022] [Indexed: 11/27/2022] Open
Abstract
Entamoeba are amoeboid extracellular parasites that represent an important group of organisms for which the regulatory networks must be examined to better understand how genes and functional processes are interrelated. In this work, we inferred the gene regulatory networks (GRNs) in four Entamoeba species, E. histolytica, E. dispar, E. nuttalli, and E. invadens, and the GRN topological properties and the corresponding biological functions were evaluated. From these analyses, we determined that transcription factors (TFs) of E. histolytica, E. dispar, and E. nuttalli are associated mainly with the LIM family, while the TFs in E. invadens are associated with the RRM_1 family. In addition, we identified that EHI_044890 regulates 121 genes in E. histolytica, EDI_297980 regulates 284 genes in E. dispar, ENU1_120230 regulates 195 genes in E. nuttalli, and EIN_249270 regulates 257 genes in E. invadens. Finally, we identified that three types of processes, Macromolecule metabolic process, Cellular macromolecule metabolic process, and Cellular nitrogen compound metabolic process, are the main biological processes for each network. The results described in this work can be used as a basis for the study of gene regulation in these organisms.
Collapse
Affiliation(s)
- Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México, México
- * E-mail: (EG-V); (EP-R)
| | - María del Consuelo Gómez-García
- Laboratorio de Biomedicina Molecular, Escuela Nacional de Medicina y Homeopatía, Instituto Politécnico Nacional, Ciudad de México, México
| | - Ernesto Pérez-Rueda
- Unidad Académica Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mérida, Yucatán, México
- * E-mail: (EG-V); (EP-R)
| |
Collapse
|
28
|
Ellis D, Wu D, Datta S. SAREV: A review on statistical analytics of single-cell RNA sequencing data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14:e1558. [PMID: 36034329 PMCID: PMC9400796 DOI: 10.1002/wics.1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/09/2021] [Indexed: 06/15/2023]
Abstract
Due to the development of next-generation RNA sequencing (NGS) technologies, there has been tremendous progress in research involving determining the role of genomics, transcriptomics and epigenomics in complex biological systems. However, scientists have realized that information obtained using earlier technology, frequently called 'bulk RNA-seq' data, provides information averaged across all the cells present in a tissue. Relatively newly developed single cell (scRNA-seq) technology allows us to provide transcriptomic information at a single-cell resolution. Nevertheless, these high-resolution data have their own complex natures and demand novel statistical data analysis methods to provide effective and highly accurate results on complex biological systems. In this review, we cover many such recently developed statistical methods for researchers wanting to pursue scRNA-seq statistical and computational research as well as scientific research about these existing methods and free software tools available for their generated data. This review is certainly not exhaustive due to page limitations. We have tried to cover the popular methods starting from quality control to the downstream analysis of finding differentially expressed genes and concluding with a brief description of network analysis.
Collapse
Affiliation(s)
- Dorothy Ellis
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Dongyuan Wu
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Susmita Datta
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| |
Collapse
|
29
|
Okubo K, Kaneko K. Heterosis of fitness and phenotypic variance in the evolution of a diploid gene regulatory network. PNAS NEXUS 2022; 1:pgac097. [PMID: 36741431 PMCID: PMC9896930 DOI: 10.1093/pnasnexus/pgac097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 06/24/2022] [Indexed: 02/07/2023]
Abstract
Heterosis describes the phenomenon, whereby a hybrid population has higher fitness than an inbred population, which has previously been explained by either Mendelian dominance or overdominance under the general assumption of a simple genotype-phenotype relationship. However, recent studies have demonstrated that genes interact through a complex gene regulatory network (GRN). Furthermore, phenotypic variance is reportedly lower for heterozygotes, and the origin of such variance-related heterosis remains elusive. Therefore, a theoretical analysis linking heterosis to GRN evolution and stochastic gene expression dynamics is required. Here, we investigated heterosis related to fitness and phenotypic variance in a system with interacting genes by numerically evolving diploid GRNs. According to the results, the heterozygote population exhibited higher fitness than the homozygote population, indicating fitness-related heterosis resulting from evolution. In addition, the heterozygote population exhibited lower noise-related phenotypic variance in expression levels than the homozygous population, implying that the heterozygote population is more robust to noise. Furthermore, the distribution of the ratio of heterozygote phenotypic variance to homozygote phenotypic variance exhibited quantitative similarity with previous experimental results. By applying dominance and differential gene expression rather than only a single gene expression model, we confirmed the correlation between heterosis and differential gene expression. We explain our results by proposing that the convex high-fitness region is evolutionarily shaped in the genetic space to gain noise robustness under genetic mixing through sexual reproduction. These results provide new insights into the effects of GRNs on variance-related heterosis and differential gene expression.
Collapse
Affiliation(s)
- Kenji Okubo
- Research Center for Integrative Evolutionary Science, the Graduate University for Advanced Studies, SOKENDAI, Hayama, Kanagawa, 240-0193, Japan
| | | |
Collapse
|
30
|
Quantifying biochemical reaction rates from static population variability within incompletely observed complex networks. PLoS Comput Biol 2022; 18:e1010183. [PMID: 35731728 PMCID: PMC9216546 DOI: 10.1371/journal.pcbi.1010183] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 05/07/2022] [Indexed: 11/19/2022] Open
Abstract
Quantifying biochemical reaction rates within complex cellular processes remains a key challenge of systems biology even as high-throughput single-cell data have become available to characterize snapshots of population variability. That is because complex systems with stochastic and non-linear interactions are difficult to analyze when not all components can be observed simultaneously and systems cannot be followed over time. Instead of using descriptive statistical models, we show that incompletely specified mechanistic models can be used to translate qualitative knowledge of interactions into reaction rate functions from covariability data between pairs of components. This promises to turn a globally intractable problem into a sequence of solvable inference problems to quantify complex interaction networks from incomplete snapshots of their stochastic fluctuations.
Collapse
|
31
|
Cano R, Lenz AR, Galan-Vasquez E, Ramirez-Prado JH, Perez-Rueda E. Gene Regulatory Network Inference and Gene Module Regulating Virulence in Fusarium oxysporum. Front Microbiol 2022; 13:861528. [PMID: 35722316 PMCID: PMC9201490 DOI: 10.3389/fmicb.2022.861528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/09/2022] [Indexed: 11/20/2022] Open
Abstract
In this work, we inferred the gene regulatory network (GRN) of the fungus Fusarium oxysporum by using the regulatory networks of Aspergillus nidulans FGSC A4, Neurospora crassa OR74A, Saccharomyces cerevisiae S288c, and Fusarium graminearum PH-1 as templates for sequence comparisons. Topological properties to infer the role of transcription factors (TFs) and to identify functional modules were calculated in the GRN. From these analyzes, five TFs were identified as hubs, including FOXG_04688 and FOXG_05432, which regulate 2,404 and 1,864 target genes, respectively. In addition, 16 communities were identified in the GRN, where the largest contains 1,923 genes and the smallest contains 227 genes. Finally, the genes associated with virulence were extracted from the GRN and exhaustively analyzed, and we identified a giant module with ten TFs and 273 target genes, where the most highly connected node corresponds to the transcription factor FOXG_05265, homologous to the putative bZip transcription factor CPTF1 of Claviceps purpurea, which is involved in ergotism disease that affects cereal crops and grasses. The results described in this work can be used for the study of gene regulation in this organism and open the possibility to explore putative genes associated with virulence against their host.
Collapse
Affiliation(s)
- Regnier Cano
- Centro de Investigaciones Científicas de Yucatán, Mérida, Mexico
| | - Alexandre Rafael Lenz
- Departamento de Ciências Exatas e da Terra, Universidade do Estado da Bahia, Salvador, Brazil
| | - Edgardo Galan-Vasquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico, Mexico
| | | | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Unidad Académica Yucatán Universidad Nacional Autónoma de México, Mérida, Mexico
| |
Collapse
|
32
|
Jiang YH, Long J, Zhao ZB, Li L, Lian ZX, Liang Z, Wu JR. Gene co-expression network based on part mutual information for gene-to-gene relationship and gene-cancer correlation analysis. BMC Bioinformatics 2022; 23:194. [PMID: 35610556 PMCID: PMC9128248 DOI: 10.1186/s12859-022-04732-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 05/09/2022] [Indexed: 11/10/2022] Open
Abstract
Background Finding correlation patterns is an important goal of analyzing biological data. Currently available methods for correlation analysis mainly use non-direct associations, such as the Pearson correlation coefficient, and focus on the interpretation of networks at the level of modules. For biological objects such as genes, their collective function depends on pairwise gene-to-gene interactions. However, a large amount of redundant results from module level methods often necessitate further detailed analysis of gene interactions. New approaches of measuring direct associations among variables, such as the part mutual information (PMI), may help us better interpret the correlation pattern of biological data at the level of variable pairs. Results We use PMI to calculate gene co-expression networks of cancer mRNA transcriptome data. Our results show that the PMI-based networks with fewer edges could represent the correlation pattern and are robust across biological conditions. The PMI-based networks recall significantly more important parts of omics defined gene-pair relationships than the Pearson Correlation Coefficient (PCC)-based networks. Based on the scores derived from PMI-recalled copy number variation or DNA methylation gene-pairs, the patients with cancer can be divided into groups with significant differences on disease specific survival. Conclusions PMI, measuring direct associations between variables, extracts more important biological relationships at the level of gene pairs than conventional indirect association measures do. It can be used to refine module level results from other correlation methods. Particularly, PMI is beneficial to analysis of biological data of the complicated systems, for example, cancer transcriptome data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04732-9.
Collapse
Affiliation(s)
- Yi-Hua Jiang
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Science, University of Science and Technology of China, Hefei, China
| | - Jie Long
- Chronic Disease Laboratory, School of Medicine, South China University of Technology, Guangzhou, China
| | - Zhi-Bin Zhao
- Chronic Disease Laboratory, School of Medicine, South China University of Technology, Guangzhou, China
| | - Liang Li
- Chronic Disease Laboratory, School of Medicine, South China University of Technology, Guangzhou, China
| | - Zhe-Xiong Lian
- Chronic Disease Laboratory, School of Medicine, South China University of Technology, Guangzhou, China
| | - Zhi Liang
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Science, University of Science and Technology of China, Hefei, China.
| | - Jia-Rui Wu
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Science, University of Science and Technology of China, Hefei, China. .,Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China.
| |
Collapse
|
33
|
Liu W, Sun X, Yang L, Li K, Yang Y, Fu X. NSCGRN: a network structure control method for gene regulatory network inference. Brief Bioinform 2022; 23:6585392. [PMID: 35554485 DOI: 10.1093/bib/bbac156] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/27/2022] [Accepted: 04/06/2022] [Indexed: 01/18/2023] Open
Abstract
Accurate inference of gene regulatory networks (GRNs) is an essential premise for understanding pathogenesis and curing diseases. Various computational methods have been developed for GRN inference, but the identification of redundant regulation remains a challenge faced by researchers. Although combining global and local topology can identify and reduce redundant regulations, the topologies' specific forms and cooperation modes are unclear and real regulations may be sacrificed. Here, we propose a network structure control method [network-structure-controlling-based GRN inference method (NSCGRN)] that stipulates the global and local topology's specific forms and cooperation mode. The method is carried out in a cooperative mode of 'global topology dominates and local topology refines'. Global topology requires layering and sparseness of the network, and local topology requires consistency of the subgraph association pattern with the network motifs (fan-in, fan-out, cascade and feedforward loop). Specifically, an ordered gene list is obtained by network topology centrality sorting. A Bernaola-Galvan mutation detection algorithm applied to the list gives the hierarchy of GRNs to control the upstream and downstream regulations within the global scope. Finally, four network motifs are integrated into the hierarchy to optimize local complex regulations and form a cooperative mode where global and local topologies play the dominant and refined roles, respectively. NSCGRN is compared with state-of-the-art methods on three different datasets (six networks in total), and it achieves the highest F1 and Matthews correlation coefficient. Experimental results show its unique advantages in GRN inference.
Collapse
Affiliation(s)
- Wei Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Li Yang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Kaiwen Li
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yu Yang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| |
Collapse
|
34
|
Esfandiary A, Finkelstein DI, Voelcker NH, Rudd D. Clinical Sphingolipids Pathway in Parkinson’s Disease: From GCase to Integrated-Biomarker Discovery. Cells 2022; 11:cells11081353. [PMID: 35456032 PMCID: PMC9028315 DOI: 10.3390/cells11081353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 02/01/2023] Open
Abstract
Alterations in the sphingolipid metabolism of Parkinson’s Disease (PD) could be a potential diagnostic feature. Only around 10–15% of PD cases can be diagnosed through genetic alterations, while the remaining population, idiopathic PD (iPD), manifest without validated and specific biomarkers either before or after motor symptoms appear. Therefore, clinical diagnosis is reliant on the skills of the clinician, which can lead to misdiagnosis. IPD cases present with a spectrum of non-specific symptoms (e.g., constipation and loss of the sense of smell) that can occur up to 20 years before motor function loss (prodromal stage) and formal clinical diagnosis. Prodromal alterations in metabolites and proteins from the pathways underlying these symptoms could act as biomarkers if they could be differentiated from the broad values seen in a healthy age-matched control population. Additionally, these shifts in metabolites could be integrated with other emerging biomarkers/diagnostic tests to give a PD-specific signature. Here we provide an up-to-date review of the diagnostic value of the alterations in sphingolipids pathway in PD by focusing on the changes in definitive PD (postmortem confirmed brain data) and their representation in “probable PD” cerebrospinal fluid (CSF) and blood. We conclude that the trend of holistic changes in the sphingolipid pathway in the PD brain seems partly consistent in CSF and blood, and could be one of the most promising pathways in differentiating PD cases from healthy controls, with the potential to improve early-stage iPD diagnosis and distinguish iPD from other Parkinsonism when combined with other pathological markers.
Collapse
Affiliation(s)
- Ali Esfandiary
- Drug Delivery, Disposition and Dynamics, Monash University, Parkville, VIC 3052, Australia; (A.E.); (N.H.V.)
- Melbourne Centre for Nanofabrication, Victorian Node of the Australian National Fabrication Facility, Clayton, VIC 3168, Australia
| | | | - Nicolas Hans Voelcker
- Drug Delivery, Disposition and Dynamics, Monash University, Parkville, VIC 3052, Australia; (A.E.); (N.H.V.)
- Melbourne Centre for Nanofabrication, Victorian Node of the Australian National Fabrication Facility, Clayton, VIC 3168, Australia
- Commonwealth Scientific and Industrial Research Organization (CSIRO), Clayton, VIC 3168, Australia
- Materials Science and Engineering, Monash University, Clayton, VIC 3168, Australia
| | - David Rudd
- Drug Delivery, Disposition and Dynamics, Monash University, Parkville, VIC 3052, Australia; (A.E.); (N.H.V.)
- Melbourne Centre for Nanofabrication, Victorian Node of the Australian National Fabrication Facility, Clayton, VIC 3168, Australia
- Correspondence: ; Tel.: +61-3-9903-9581
| |
Collapse
|
35
|
Li Y, Wang F, Zheng Z. Adaptive Synchronization-Based Approach for Finite-Time Parameters Identification of Genetic Regulatory Networks. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10754-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
36
|
Prediction of Metabolic Profiles from Transcriptomics Data in Human Cancer Cell Lines. Int J Mol Sci 2022; 23:ijms23073867. [PMID: 35409231 PMCID: PMC8998886 DOI: 10.3390/ijms23073867] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/24/2022] [Accepted: 03/29/2022] [Indexed: 02/01/2023] Open
Abstract
The Metabolome and Transcriptome are mutually communicating within cancer cells, and this interplay is translated into the existence of quantifiable correlation structures between gene expression and metabolite abundance levels. Studying these correlations could provide a novel venue of understanding cancer and the discovery of novel biomarkers and pharmacological strategies, as well as laying the foundation for the prediction of metabolite quantities by leveraging information from the more widespread transcriptomics data. In the current paper, we investigate the correlation between gene expression and metabolite levels in the Cancer Cell Line Encyclopedia dataset, building a direct correlation network between the two molecular ensembles. We show that a metabolite/transcript correlation network can be used to predict metabolite levels in different samples and datasets, such as the NCI-60 cancer cell line dataset, both on a sample-by-sample basis and in differential contrasts. We also show that metabolite levels can be predicted in principle on any sample and dataset for which transcriptomics data are available, such as the Cancer Genome Atlas (TCGA).
Collapse
|
37
|
Prasad P, Khatoon U, Verma RK, Aalam S, Kumar A, Mohapatra D, Bhattacharya P, Bag SK, Sawant SV. Transcriptional Landscape of Cotton Fiber Development and Its Alliance With Fiber-Associated Traits. FRONTIERS IN PLANT SCIENCE 2022; 13:811655. [PMID: 35283936 PMCID: PMC8908376 DOI: 10.3389/fpls.2022.811655] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
Cotton fiber development is still an intriguing question to understand fiber commitment and development. At different fiber developmental stages, many genes change their expression pattern and have a pivotal role in fiber quality and yield. Recently, numerous studies have been conducted for transcriptional regulation of fiber, and raw data were deposited to the public repository for comprehensive integrative analysis. Here, we remapped > 380 cotton RNAseq data with uniform mapping strategies that span ∼400 fold coverage to the genome. We identified stage-specific features related to fiber cell commitment, initiation, elongation, and Secondary Cell Wall (SCW) synthesis and their putative cis-regulatory elements for the specific regulation in fiber development. We also mined Exclusively Expressed Transcripts (EETs) that were positively selected during cotton fiber evolution and domestication. Furthermore, the expression of EETs was validated in 100 cotton genotypes through the nCounter assay and correlated with different fiber-related traits. Thus, our data mining study reveals several important features related to cotton fiber development and improvement, which were consolidated in the "CottonExpress-omics" database.
Collapse
Affiliation(s)
- Priti Prasad
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Uzma Khatoon
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
- Department of Botany, University of Lucknow, Lucknow, India
| | - Rishi Kumar Verma
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Shahre Aalam
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
| | - Ajay Kumar
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
| | | | | | - Sumit K. Bag
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Samir V. Sawant
- Division of Molecular Biology and Biotechnology, CSIR-National Botanical Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
38
|
Rogers JD, Aguado BA, Watts KM, Anseth KS, Richardson WJ. Network modeling predicts personalized gene expression and drug responses in valve myofibroblasts cultured with patient sera. Proc Natl Acad Sci U S A 2022; 119:e2117323119. [PMID: 35181609 PMCID: PMC8872767 DOI: 10.1073/pnas.2117323119] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 01/12/2022] [Indexed: 02/08/2023] Open
Abstract
Aortic valve stenosis (AVS) patients experience pathogenic valve leaflet stiffening due to excessive extracellular matrix (ECM) remodeling. Numerous microenvironmental cues influence pathogenic expression of ECM remodeling genes in tissue-resident valvular myofibroblasts, and the regulation of complex myofibroblast signaling networks depends on patient-specific extracellular factors. Here, we combined a manually curated myofibroblast signaling network with a data-driven transcription factor network to predict patient-specific myofibroblast gene expression signatures and drug responses. Using transcriptomic data from myofibroblasts cultured with AVS patient sera, we produced a large-scale, logic-gated differential equation model in which 11 biochemical and biomechanical signals were transduced via a network of 334 signaling and transcription reactions to accurately predict the expression of 27 fibrosis-related genes. Correlations were found between personalized model-predicted gene expression and AVS patient echocardiography data, suggesting links between fibrosis-related signaling and patient-specific AVS severity. Further, global network perturbation analyses revealed signaling molecules with the most influence over network-wide activity, including endothelin 1 (ET1), interleukin 6 (IL6), and transforming growth factor β (TGFβ), along with downstream mediators c-Jun N-terminal kinase (JNK), signal transducer and activator of transcription (STAT), and reactive oxygen species (ROS). Lastly, we performed virtual drug screening to identify patient-specific drug responses, which were experimentally validated via fibrotic gene expression measurements in valvular interstitial cells cultured with AVS patient sera and treated with or without bosentan-a clinically approved ET1 receptor inhibitor. In sum, our work advances the ability of computational approaches to provide a mechanistic basis for clinical decisions including patient stratification and personalized drug screening.
Collapse
Affiliation(s)
- Jesse D Rogers
- Bioengineering Department, Clemson University, Clemson, SC 29634
- Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830
| | - Brian A Aguado
- Chemical and Biological Engineering Department, BioFrontiers Institute, University of Colorado, Boulder, CO 80309
- Bioengineering Department, University of California San Diego, La Jolla, CA 92093
- Stem Cell Program, Sanford Consortium for Regenerative Medicine, La Jolla, CA 92037
| | - Kelsey M Watts
- Bioengineering Department, Clemson University, Clemson, SC 29634
| | - Kristi S Anseth
- Chemical and Biological Engineering Department, BioFrontiers Institute, University of Colorado, Boulder, CO 80309;
| | | |
Collapse
|
39
|
The use of machine learning to discover regulatory networks controlling biological systems. Mol Cell 2022; 82:260-273. [PMID: 35016036 PMCID: PMC8905511 DOI: 10.1016/j.molcel.2021.12.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 01/22/2023]
Abstract
Biological systems are composed of a vast web of multiscale molecular interactors and interactions. High-throughput technologies, both bulk and single cell, now allow for investigation of the properties and quantities of these interactors. Computational algorithms and machine learning methods then provide the tools to derive meaningful insights from the resulting data sets. One such approach is graphical network modeling, which provides a computational framework to explicitly model the molecular interactions within and between the cells comprising biological systems. These graphical networks aim to describe a putative chain of cause and effect between interacting molecules. This feature allows for determination of key molecules in a biological process, accelerated generation of mechanistic hypotheses, and simulation of experimental outcomes. We review the computational concepts and applications of graphical network models across molecular scales for both intracellular and intercellular regulatory biology, examples of successful applications, and the future directions needed to overcome current limitations.
Collapse
|
40
|
Emerging Machine Learning Techniques for Modelling Cellular Complex Systems in Alzheimer's Disease. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1338:199-208. [PMID: 34973026 DOI: 10.1007/978-3-030-78775-2_24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We live in the big data era in the biomedical field, where machine learning has a very important contribution to the interpretation of complex biological processes and diseases, since it has the potential to create predictive models from multidimensional data sets. Part of the application of machine learning in biomedical science is to study and model complex cellular systems such as biological networks. In this context, the study of complex diseases, such as Alzheimer's diseases (AD), benefits from established methodologies of network science and machine learning as they offer algorithmic tools and techniques that can address the limitations and challenges of modeling and studying cellular AD-related networks. In this paper we analyze the opportunities and challenges at the intersection of machine learning and network biology and whether this can affect the biological interpretation and clarification of diseases. Specifically, we focus on GRN techniques which through omics data and the use of machine learning techniques can construct a network that captures all the information at the molecular level for the disease under study. We record the emerging machine learning techniques that are focus on ensemble tree-based techniques in the area of classification and regression. Their potential for unraveling the complexity of model cellular systems in complex diseases, such as AD, offers the opportunity for novel machine learning methodologies to decipher the mechanisms of the various AD processes.
Collapse
|
41
|
Santana-Garcia W, Castro-Mondragon JA, Padilla-Gálvez M, Nguyen NT, Elizondo-Salas A, Ksouri N, Gerbes F, Thieffry D, Vincens P, Contreras-Moreira B, van Helden J, Thomas-Chollier M, Medina-Rivera A. OUP accepted manuscript. Nucleic Acids Res 2022; 50:W670-W676. [PMID: 35544234 PMCID: PMC9252783 DOI: 10.1093/nar/gkac312] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/12/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (including from genome-wide datasets like ChIP-seq/ATAC-seq) (ii) genomic sequences scanning with known motifs, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations and (v) comparative genomics. RSAT comprises 50 tools. Six public Web servers (including a teaching server) are offered to meet the needs of different biological communities. RSAT philosophy and originality are: (i) a multi-modal access depending on the user needs, through web forms, command-line for local installation and programmatic web services, (ii) a support for virtually any genome (animals, bacteria, plants, totalizing over 10 000 genomes directly accessible). Since the 2018 NAR Web Software Issue, we have developed a large REST API, extended the support for additional genomes and external motif collections, enhanced some tools and Web forms, and developed a novel tool that builds or refine gene regulatory networks using motif scanning (network-interactions). The RSAT website provides extensive documentation, tutorials and published protocols. RSAT code is under open-source license and now hosted in GitHub. RSAT is available at http://www.rsat.eu/.
Collapse
Affiliation(s)
| | | | - Mónica Padilla-Gálvez
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Nga Thi Thuy Nguyen
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Ana Elizondo-Salas
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Najla Ksouri
- Estación Experimental de Aula Dei-CSIC, 50059 Zaragoza, Spain
| | - François Gerbes
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
| | - Denis Thieffry
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Pierre Vincens
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | | | | | | | | |
Collapse
|
42
|
Mercatelli D, Formaggio F, Caprini M, Holding A, Giorgi F. Detection of subtype-specific breast cancer surface protein biomarkers via a novel transcriptomics approach. Biosci Rep 2021; 41:BSR20212218. [PMID: 34750607 PMCID: PMC8655506 DOI: 10.1042/bsr20212218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/29/2021] [Accepted: 11/08/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Cell-surface proteins have been widely used as diagnostic and prognostic markers in cancer research and as targets for the development of anticancer agents. So far, very few attempts have been made to characterize the surfaceome of patients with breast cancer, particularly in relation with the current molecular breast cancer (BRCA) classification. In this view, we developed a new computational method to infer cell-surface protein activities from transcriptomics data, termed 'SURFACER'. METHODS Gene expression data from GTEx were used to build a normal breast network model as input to infer differential cell-surface proteins activity in BRCA tissue samples retrieved from TCGA versus normal samples. Data were stratified according to the PAM50 transcriptional subtypes (Luminal A, Luminal B, HER2 and Basal), while unsupervised clustering techniques were applied to define BRCA subtypes according to cell-surface proteins activity. RESULTS Our approach led to the identification of 213 PAM50 subtypes-specific deregulated surface genes and the definition of five BRCA subtypes, whose prognostic value was assessed by survival analysis, identifying a cell-surface activity configuration at increased risk. The value of the SURFACER method in BRCA genotyping was tested by evaluating the performance of 11 different machine learning classification algorithms. CONCLUSIONS BRCA patients can be stratified into five surface activity-specific groups having the potential to identify subtype-specific actionable targets to design tailored targeted therapies or for diagnostic purposes. SURFACER-defined subtypes show also a prognostic value, identifying surface-activity profiles at higher risk.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Francesco Formaggio
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Marco Caprini
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Andrew Holding
- York Biomedical Research Institute, University of York, Heslington, York, YO10 5DD, U.K
| | - Federico M. Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
43
|
Patakova P, Branska B, Vasylkivska M, Jureckova K, Musilova J, Provaznik I, Sedlar K. Transcriptomic studies of solventogenic clostridia, Clostridium acetobutylicum and Clostridium beijerinckii. Biotechnol Adv 2021; 58:107889. [PMID: 34929313 DOI: 10.1016/j.biotechadv.2021.107889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/10/2021] [Accepted: 12/14/2021] [Indexed: 12/13/2022]
Abstract
Solventogenic clostridia are not a strictly defined group within the genus Clostridium but its representatives share some common features, i.e. they are anaerobic, non-pathogenic, non-toxinogenic and endospore forming bacteria. Their main metabolite is typically 1-butanol but depending on species and culture conditions, they can form other metabolites such as acetone, isopropanol, ethanol, butyric, lactic and acetic acids, and hydrogen. Although these organisms were previously used for the industrial production of solvents, they later fell into disuse, being replaced by more efficient chemical production. A return to a more biological production of solvents therefore requires a thorough understanding of clostridial metabolism. Transcriptome analysis, which reflects the involvement of individual genes in all cellular processes within a population, at any given (sampling) moment, is a valuable tool for gaining a deeper insight into clostridial life. In this review, we describe techniques to study transcription, summarize the evolution of these techniques and compare methods for data processing and visualization of solventogenic clostridia, particularly the species Clostridium acetobutylicum and Clostridium beijerinckii. Individual approaches for evaluating transcriptomic data are compared and their contributions to advancements in the field are assessed. Moreover, utilization of transcriptomic data for reconstruction of computational clostridial metabolic models is considered and particular models are described. Transcriptional changes in glucose transport, central carbon metabolism, the sporulation cycle, butanol and butyrate stress responses, the influence of lignocellulose-derived inhibitors on growth and solvent production, and other respective topics, are addressed and common trends are highlighted.
Collapse
Affiliation(s)
- Petra Patakova
- University of Chemistry and Technology Prague, Technicka 5, 16628 Prague 6, Czech Republic.
| | - Barbora Branska
- University of Chemistry and Technology Prague, Technicka 5, 16628 Prague 6, Czech Republic
| | - Maryna Vasylkivska
- University of Chemistry and Technology Prague, Technicka 5, 16628 Prague 6, Czech Republic
| | | | - Jana Musilova
- Brno University of Technology, Technicka 10, 61600 Brno, Czech Republic
| | - Ivo Provaznik
- Brno University of Technology, Technicka 10, 61600 Brno, Czech Republic
| | - Karel Sedlar
- Brno University of Technology, Technicka 10, 61600 Brno, Czech Republic
| |
Collapse
|
44
|
Bodein A, Scott-Boyer MP, Perin O, Lê Cao KA, Droit A. Interpretation of network-based integration from multi-omics longitudinal data. Nucleic Acids Res 2021; 50:e27. [PMID: 34883510 PMCID: PMC8934642 DOI: 10.1093/nar/gkab1200] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/19/2021] [Accepted: 11/22/2021] [Indexed: 12/26/2022] Open
Abstract
Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.
Collapse
Affiliation(s)
- Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Perin
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
45
|
Chen Y, Sun Y, Xu Y, Lin WW, Luo Z, Han Z, Liu S, Qi B, Sun C, Go K, Kang XR, Chen J. Single-Cell Integration Analysis of Heterotopic Ossification and Fibrocartilage Developmental Lineage: Endoplasmic Reticulum Stress Effector Xbp1 Transcriptionally Regulates the Notch Signaling Pathway to Mediate Fibrocartilage Differentiation. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2021; 2021:7663366. [PMID: 34737845 PMCID: PMC8563124 DOI: 10.1155/2021/7663366] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 09/21/2021] [Accepted: 10/01/2021] [Indexed: 02/06/2023]
Abstract
INTRODUCTION Regeneration of fibrochondrocytes is essential for the healing of the tendon-bone interface (TBI), which is similar to the formation of neurogenic heterotopic ossification (HO). Through single-cell integrative analysis, this study explored the homogeneity of HO cells and fibrochondrocytes. METHODS This study integrated six datasets, namely, GSE94683, GSE144306, GSE168153, GSE138515, GSE102929, and GSE110993. The differentiation trajectory and key transcription factors (TFs) for HO occurrence were systematically analyzed by integrating single-cell RNA (scRNA) sequencing, bulk RNA sequencing, and assay of transposase accessible chromatin seq. The differential expression and enrichment pathways of TFs in heterotopically ossified tissues were identified. RESULTS HO that mimicked pathological cells was classified into HO1 and HO2 cell subsets. Results of the pseudo-temporal sequence analysis suggested that HO2 is a differentiated precursor cell of HO1. The analysis of integrated scRNA data revealed that ectopically ossified cells have similar transcriptional characteristics to cells in the fibrocartilaginous zone of tendons. The modified SCENIC method was used to identify specific transcriptional regulators associated with ectopic ossification. Xbp1 was defined as a common key transcriptional regulator of ectopically ossified tissues and the fibrocartilaginous zone of tendons. Subsequently, the CellPhoneDB database was completed for the cellular ligand-receptor analysis. With further pathway screening, this study is the first to propose that Xbp1 may upregulate the Notch signaling pathway through Jag1 transcription. Twenty-four microRNAs were screened and were found to be potentially associated with upregulation of XBP1 expression after acute ischemic stroke. CONCLUSION A systematic analysis of the differentiation landscape and cellular homogeneity facilitated a molecular understanding of the phenotypic similarities between cells in the fibrocartilaginous region of tendon and HO cells. Furthermore, by identifying Xbp1 as a hub regulator and by conducting a ligand-receptor analysis, we propose a potential Xbp1/Jag1/Notch signaling pathway.
Collapse
Affiliation(s)
- Yisheng Chen
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, China
- Department of Sports Medicine, Huashan Hospital, Fudan University, Shanghai, China
| | - Yaying Sun
- Department of Sports Medicine, Huashan Hospital, Fudan University, Shanghai, China
| | - Yuzhen Xu
- Department of Rehabilitation, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province 271000, China
| | - Wei-Wei Lin
- Department of Neurosurgery, Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, 88 Jiefang Road, Hangzhou, 310009 Zhejiang, China
| | - Zhiwen Luo
- Department of Sports Medicine, Huashan Hospital, Fudan University, Shanghai, China
| | - Zhihua Han
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, China
| | - Shaohua Liu
- Department of Sports Medicine, Huashan Hospital, Fudan University, Shanghai, China
| | - Beijie Qi
- Department of Sports Medicine, Huashan Hospital, Fudan University, Shanghai, China
| | - Chenyu Sun
- Internal Medicine, AMITA Health Saint Joseph Hospital Chicago, 2900 N. Lake Shore Drive, Chicago, 60657 Illinois, USA
| | - Ken Go
- Department of Clinical Training Centre, St. Marianna Hospital, Tokyo, Japan
| | - x.-R. Kang
- Shanghai Jiao Tong University, Shanghai 200080, China
| | - Jiwu Chen
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, China
| |
Collapse
|
46
|
Simon F, Konstantinides N. Single-cell transcriptomics in the Drosophila visual system: Advances and perspectives on cell identity regulation, connectivity, and neuronal diversity evolution. Dev Biol 2021; 479:107-122. [PMID: 34375653 DOI: 10.1016/j.ydbio.2021.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 07/10/2021] [Accepted: 08/03/2021] [Indexed: 11/17/2022]
Abstract
The Drosophila visual system supports complex behaviors and shares many of its anatomical and molecular features with the vertebrate brain. Yet, it contains a much more manageable number of neurons and neuronal types. In addition to the extensive Drosophila genetic toolbox, this relative simplicity has allowed decades of work to yield a detailed account of its neuronal type diversity, morphology, connectivity and specification mechanisms. In the past three years, numerous studies have applied large scale single-cell transcriptomic approaches to the Drosophila visual system and have provided access to the complete gene expression profile of most neuronal types throughout development. This makes the fly visual system particularly well suited to perform detailed studies of the genetic mechanisms underlying the evolution and development of neuronal systems. Here, we highlight how these transcriptomic resources allow exploring long-standing biological questions under a new light. We first present the efforts made to characterize neuronal diversity in the Drosophila visual system and suggest ways to further improve this description. We then discuss current advances allowed by the single-cell datasets, and envisage how these datasets can be further leveraged to address fundamental questions regarding the regulation of neuronal identity, neuronal circuit development and the evolution of neuronal diversity.
Collapse
Affiliation(s)
- Félix Simon
- Department of Biology, New York University, New York, NY, 10003, USA.
| | - Nikolaos Konstantinides
- Department of Biology, New York University, New York, NY, 10003, USA; Institut Jacques Monod, Centre National de la Recherche Scientifique-UMR 7592, Université Paris Diderot, Paris, France.
| |
Collapse
|
47
|
Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data. Proc Natl Acad Sci U S A 2021; 118:2026754118. [PMID: 33975951 DOI: 10.1073/pnas.2026754118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density ±2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.
Collapse
|
48
|
Mercatelli D, Balboni N, Giorgio FD, Aleo E, Garone C, Giorgi FM. The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow. Methods Protoc 2021; 4:mps4020028. [PMID: 34066513 PMCID: PMC8163004 DOI: 10.3390/mps4020028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 05/03/2021] [Accepted: 05/04/2021] [Indexed: 12/15/2022] Open
Abstract
Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) is a recently established multimodal single cell analysis technique combining the immunophenotyping capabilities of antibody labeling and cell sorting with the resolution of single-cell RNA sequencing (scRNA-seq). By simply adding a 12-bp nucleotide barcode to antibodies (cell hashing), CITE-seq can be used to sequence antibody-bound tags alongside the cellular mRNA, thus reducing costs of scRNA-seq by performing it at the same time on multiple barcoded samples in a single run. Here, we illustrate an ideal CITE-seq data analysis workflow by characterizing the transcriptome of SH-SY5Y neuroblastoma cell line, a widely used model to study neuronal function and differentiation. We obtained transcriptomes from a total of 2879 single cells, measuring an average of 1600 genes/cell. Along with standard scRNA-seq data handling procedures, such as quality checks and cell filtering procedures, we performed exploratory analyses to identify most stable genes to be possibly used as reference housekeeping genes in qPCR experiments. We also illustrate how to use some popular R packages to investigate cell heterogeneity in scRNA-seq data, namely Seurat, Monocle, and slalom. Both the CITE-seq dataset and the code used to analyze it are freely shared and fully reusable for future research.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy;
- Correspondence: (D.M.); (F.M.G.); Tel.: +39-05-12094521 (F.M.G.)
| | - Nicola Balboni
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy;
| | - Francesca De Giorgio
- Department of Medical and Surgical Sciences, University of Bologna, 40138 Bologna, Italy; (F.D.G.); (C.G.)
- Center for Applied Biomedical Research (CRBA), University of Bologna, 40138 Bologna, Italy
| | | | - Caterina Garone
- Department of Medical and Surgical Sciences, University of Bologna, 40138 Bologna, Italy; (F.D.G.); (C.G.)
- Center for Applied Biomedical Research (CRBA), University of Bologna, 40138 Bologna, Italy
| | - Federico Manuel Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy;
- Correspondence: (D.M.); (F.M.G.); Tel.: +39-05-12094521 (F.M.G.)
| |
Collapse
|
49
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
50
|
Single-Cell Gene Network Analysis and Transcriptional Landscape of MYCN-Amplified Neuroblastoma Cell Lines. Biomolecules 2021; 11:biom11020177. [PMID: 33525507 PMCID: PMC7912277 DOI: 10.3390/biom11020177] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 01/21/2021] [Accepted: 01/23/2021] [Indexed: 12/13/2022] Open
Abstract
Neuroblastoma (NBL) is a pediatric cancer responsible for more than 15% of cancer deaths in children, with 800 new cases each year in the United States alone. Genomic amplification of the MYC oncogene family member MYCN characterizes a subset of high-risk pediatric neuroblastomas. Several cellular models have been implemented to study this disease over the years. Two of these, SK-N-BE-2-C (BE2C) and Kelly, are amongst the most used worldwide as models of MYCN-Amplified human NBL. Here, we provide a transcriptome-wide quantitative measurement of gene expression and transcriptional network activity in BE2C and Kelly cell lines at an unprecedented single-cell resolution. We obtained 1105 Kelly and 962 BE2C unsynchronized cells, with an average number of mapped reads/cell of roughly 38,000. The single-cell data recapitulate gene expression signatures previously generated from bulk RNA-Seq. We highlight low variance for commonly used housekeeping genes between different cells (ACTB, B2M and GAPDH), while showing higher than expected variance for metallothionein transcripts in Kelly cells. The high number of samples, despite the relatively low read coverage of single cells, allowed for robust pathway enrichment analysis and master regulator analysis (MRA), both of which highlight the more mesenchymal nature of BE2C cells as compared to Kelly cells, and the upregulation of TWIST1 and DNAJC1 transcriptional networks. We further defined master regulators at the single cell level and showed that MYCN is not constantly active or expressed within Kelly and BE2C cells, independently of cell cycle phase. The dataset, alongside a detailed and commented programming protocol to analyze it, is fully shared and reusable.
Collapse
|