51
|
Chen L, Wan Y, Yang T, Zhang Q, Zeng Y, Zheng S, Ling Z, Xiao Y, Wan Q, Liu R, Yang C, Huang G, Zeng Q. Bibliometric and visual analysis of single-cell sequencing from 2010 to 2022. Front Genet 2024; 14:1285599. [PMID: 38274109 PMCID: PMC10808606 DOI: 10.3389/fgene.2023.1285599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 12/31/2023] [Indexed: 01/27/2024] Open
Abstract
Background: Single-cell sequencing (SCS) is a technique used to analyze the genome, transcriptome, epigenome, and other genetic data at the level of a single cell. The procedure is commonly utilized in multiple fields, including neurobiology, immunology, and microbiology, and has emerged as a key focus of life science research. However, a thorough and impartial analysis of the existing state and trends of SCS-related research is lacking. The current study aimed to map the development trends of studies on SCS during the years 2010-2022 through bibliometric software. Methods: Pertinent papers on SCS from 2010 to 2022 were obtained using the Web of Science Core Collection. Research categories, nations/institutions, authors/co-cited authors, journals/co-cited journals, co-cited references, and keywords were analyzed using VOSviewer, the R package "bibliometric", and CiteSpace. Results: The bibliometric analysis included 9,929 papers published between 2010 and 2022, and showed a consistent increase in the quantity of papers each year. The United States was the source of the highest quantity of articles and citations in this field. The majority of articles were published in the periodical Nature Communications. Butler A was the most frequently quoted author on this topic, and his article "Integrating single-cell transcriptome data across diverse conditions, technologies, and species" has received numerous citations to date. The literature and keyword analysis showed that studies involving single-cell RNA sequencing (scRNA-seq) were prominent in this discipline during the study period. Conclusion: This study utilized bibliometric techniques to visualize research in SCS-related domains, which facilitated the identification of emerging patterns and future directions in the field. Current hot topics in SCS research include COVID-19, tumor microenvironment, scRNA-seq, and neuroscience. Our results are significant for scholars seeking to identify key issues and generate new research ideas.
Collapse
Affiliation(s)
- Ling Chen
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yantong Wan
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of BasicMedical Sciences, Southern Medical University, Guangzhou, China
| | - Tingting Yang
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qi Zhang
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Yuting Zeng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Shuqi Zheng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Zhishan Ling
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Yupeng Xiao
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qingyi Wan
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Ruili Liu
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Chun Yang
- Dongguan Key Laboratory of Stem Cell and Regenerative Tissue Engineering, Guangdong Medical University, Dongguan, China
| | - Guozhi Huang
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qing Zeng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| |
Collapse
|
52
|
Lodi MK, Chernikov A, Ghosh P. COFFEE: Consensus Single Cell-Type Specific Inference for Gene Regulatory Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574445. [PMID: 38260386 PMCID: PMC10802453 DOI: 10.1101/2024.01.05.574445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared to individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated and experimental datasets when compared to baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| |
Collapse
|
53
|
Jiang H, Liu J, Song Y, Lei J. Quantitative Modeling of Stemness in Single-Cell RNA Sequencing Data: A Nonlinear One-Class Support Vector Machine Method. J Comput Biol 2024; 31:41-57. [PMID: 38010500 DOI: 10.1089/cmb.2022.0484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023] Open
Abstract
Intratumoral heterogeneity and the presence of cancer stem cells are challenging issues in cancer therapy. An appropriate quantification of the stemness of individual cells for assessing the potential for self-renewal and differentiation from the cell of origin can define a measurement for quantifying different cell states, which is important in understanding the dynamics of cancer evolution, and might further provide possible targeted therapies aimed at tumor stem cells. Nevertheless, it is usually difficult to quantify the stemness of a cell based on molecular information associated with the cell. In this study, we proposed a stemness definition method with one-class Hadamard kernel support vector machine (OCHSVM) based on single-cell RNA sequencing (scRNA-seq) data. Applications of the proposed OCHSVM stemness are assessed by various data sets, including preimplantation embryo cells, induced pluripotent stem cells, or tumor cells. We further compared the OCHSVM model with state-of-the-art methods CytoTRACE, one-class logistic regression, or one-class SVM methods with different kernels. The computational results demonstrate that the OCHSVM method is more suitable for stemness identification using scRNA-seq data.
Collapse
Affiliation(s)
- Hao Jiang
- School of Mathematics, Renmin University of China, Beijing, China
| | - Jingxin Liu
- School of Software, Beihang University, Beijing, China
| | - You Song
- School of Software, Beihang University, Beijing, China
| | - Jinzhi Lei
- School of Mathematical Sciences, Center for Applied Mathematics, Tiangong University, Tianjin, China
| |
Collapse
|
54
|
Wang P, Wen X, Li H, Lang P, Li S, Lei Y, Shu H, Gao L, Zhao D, Zeng J. Deciphering driver regulators of cell fate decisions from single-cell transcriptomics data with CEFCON. Nat Commun 2023; 14:8459. [PMID: 38123534 PMCID: PMC10733330 DOI: 10.1038/s41467-023-44103-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open
Abstract
Single-cell technologies enable the dynamic analyses of cell fate mapping. However, capturing the gene regulatory relationships and identifying the driver factors that control cell fate decisions are still challenging. We present CEFCON, a network-based framework that first uses a graph neural network with attention mechanism to infer a cell-lineage-specific gene regulatory network (GRN) from single-cell RNA-sequencing data, and then models cell fate dynamics through network control theory to identify driver regulators and the associated gene modules, revealing their critical biological processes related to cell states. Extensive benchmarking tests consistently demonstrated the superiority of CEFCON in GRN construction, driver regulator identification, and gene module identification over baseline methods. When applied to the mouse hematopoietic stem cell differentiation data, CEFCON successfully identified driver regulators for three developmental lineages, which offered useful insights into their differentiation from a network control perspective. Overall, CEFCON provides a valuable tool for studying the underlying mechanisms of cell fate decisions from single-cell RNA-seq data.
Collapse
Affiliation(s)
- Peizhuo Wang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
- School of Engineering, Westlake University, 310030, Hangzhou, Zhejiang Province, China
| | - Xiao Wen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, 100101, Beijing, China
| | - Han Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Peng Lang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
- School of Engineering, Westlake University, 310030, Hangzhou, Zhejiang Province, China
| | - Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Hantao Shu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, 710071, Xi'an, Shaanxi Province, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
- School of Engineering, Westlake University, 310030, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
55
|
Zheng R, Xu Z, Zeng Y, Wang E, Li M. SPIDE: A single cell potency inference method based on the local cell-specific network entropy. Methods 2023; 220:90-97. [PMID: 37952704 DOI: 10.1016/j.ymeth.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 11/14/2023] Open
Abstract
For a given single cell RNA-seq data, it is critical to pinpoint key cellular stages and quantify cells' differentiation potency along a differentiation pathway in a time course manner. Currently, several methods based on the entropy of gene functions or PPI network have been proposed to solve the problem. Nevertheless, these methods still suffer from the inaccurate interactions and noises originating from scRNA-seq profile. In this study, we proposed a cell potency inference method based on cell-specific network entropy, called SPIDE. SPIDE introduces the local weighted cell-specific network for each cell to maintain cell heterogeneity and calculates the entropy by incorporating gene expression with network structure. In this study, we compared three cell entropy estimation models on eight scRNA-Seq datasets. The results show that SPIDE obtains consistent conclusions with real cell differentiation potency on most datasets. Moreover, SPIDE accurately recovers the continuous changes of potency during cell differentiation and significantly correlates with the stemness of tumor cells in Colorectal cancer. To conclude, our study provides a universal and accurate framework for cell entropy estimation, which deepens our understanding of cell differentiation, the development of diseases and other related biological research.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziwei Xu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Edwin Wang
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary T2N 4N1, Alberta, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
56
|
Zheng W, Min W, Wang S. TsImpute: an accurate two-step imputation method for single-cell RNA-seq data. Bioinformatics 2023; 39:btad731. [PMID: 38039139 PMCID: PMC10724850 DOI: 10.1093/bioinformatics/btad731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 11/22/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology has enabled discovering gene expression patterns at single cell resolution. However, due to technical limitations, there are usually excessive zeros, called "dropouts," in scRNA-seq data, which may mislead the downstream analysis. Therefore, it is crucial to impute these dropouts to recover the biological information. RESULTS We propose a two-step imputation method called tsImpute to impute scRNA-seq data. At the first step, tsImpute adopts zero-inflated negative binomial distribution to discriminate dropouts from true zeros and performs initial imputation by calculating the expected expression level. At the second step, it conducts clustering with this modified expression matrix, based on which the final distance weighted imputation is performed. Numerical results based on both simulated and real data show that tsImpute achieves favorable performance in terms of gene expression recovery, cell clustering, and differential expression analysis. AVAILABILITY AND IMPLEMENTATION The R package of tsImpute is available at https://github.com/ZhengWeihuaYNU/tsImpute.
Collapse
Affiliation(s)
- Weihua Zheng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
| | - Wenwen Min
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| |
Collapse
|
57
|
Zheng L, Allen GI. Graphical Model Inference with Erosely Measured Data. J Am Stat Assoc 2023; 119:2282-2293. [PMID: 39328784 PMCID: PMC11424035 DOI: 10.1080/01621459.2023.2256503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 06/15/2023] [Accepted: 08/13/2023] [Indexed: 09/28/2024]
Abstract
In this paper, we investigate the Gaussian graphical model inference problem in a novel setting that we call erose measurements, referring to irregularly measured or observed data. For graphs, this results in different node pairs having vastly different sample sizes which frequently arises in data integration, genomics, neuroscience, and sensor networks. Existing works characterize the graph selection performance using the minimum pairwise sample size, which provides little insights for erosely measured data, and no existing inference method is applicable. We aim to fill in this gap by proposing the first inference method that characterizes the different uncertainty levels over the graph caused by the erose measurements, named GI-JOE (Graph Inference when Joint Observations are Erose). Specifically, we develop an edge-wise inference method and an affiliated FDR control procedure, where the variance of each edge depends on the sample sizes associated with corresponding neighbors. We prove statistical validity under erose measurements, thanks to careful localized edge-wise analysis and disentangling the dependencies across the graph. Finally, through simulation studies and a real neuroscience data example, we demonstrate the advantages of our inference methods for graph selection from erosely measured data.
Collapse
Affiliation(s)
- Lili Zheng
- Department of Electrical and Computer Engineering, Rice University
| | - Genevera I Allen
- Department of Electrical and Computer Engineering, Rice University
- Department of Computer Science, Rice University
- Department of Statistics, Rice University
- Department of Pediatrics-Neurology, Baylor College of Medicine
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital
| |
Collapse
|
58
|
Li B, Jin X, Chan HM. Effects of low doses of methylmercury (MeHg) exposure on definitive endoderm cell differentiation in human embryonic stem cells. Arch Toxicol 2023; 97:2625-2641. [PMID: 37612375 PMCID: PMC10475006 DOI: 10.1007/s00204-023-03580-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/03/2023] [Indexed: 08/25/2023]
Abstract
Fetal development is one of the most sensitive windows to methylmercury (MeHg) toxicity. Laboratory and epidemiological studies have shown a dose-response relationship between fetal MeHg exposure and neuro performance in different life stages from infants to adults. In addition, MeHg exposure has been reported to be associated with disorders in endoderm-derived organs, such as morphological changes in liver cells and pancreatic cell dysfunctions. However, the mechanisms of the effects of MeHg on non-neuronal organs or systems, especially during the early development of endoderm-derived organs, remain unclear. Here we determined the effects of low concentrations of MeHg exposure during the differentiation of definitive endoderm (DE) cells from human embryonic stem cells (hESCs). hESCs were exposed to MeHg (0, 10, 100, and 200 nM) that covers the range of Hg concentrations typically found in human maternal blood during DE cell induction. Transcriptomic analysis showed that sub-lethal doses of MeHg exposure could alter global gene expression patterns during hESC to DE cell differentiation, leading to increased expression of endodermal genes/proteins and the over-promotion of endodermal fate, mainly through disrupting calcium homeostasis and generating ROS. Bioinformatic analysis results suggested that MeHg exerts its developmental toxicity mainly by disrupting ribosome biogenesis during early cell lineage differentiation. This disruption could lead to aberrant growth or dysfunctions of the developing endoderm-derived organs, and it may be the underlying mechanism for the observed congenital diseases later in life. Based on the results, we proposed an adverse outcome pathway for the effects of MeHg exposure during human embryonic stem cells to definitive endoderm differentiation.
Collapse
Affiliation(s)
- Bai Li
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, K1N 6N5, Canada
| | - Xiaolei Jin
- Regulatory Toxicology Research Division, Bureau of Chemical Safety, Food Directorate, HPFB, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, ON, K1A 0K9, Canada.
| | - Hing Man Chan
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, K1N 6N5, Canada.
| |
Collapse
|
59
|
Karagiannis K, Gannavaram S, Verma C, Pacheco-Fernandez T, Bhattacharya P, Nakhasi HL, Satoskar AR. Dual-scRNA-seq analysis reveals rare and uncommon parasitized cell populations in chronic L. donovani infection. Cell Rep 2023; 42:113097. [PMID: 37682713 DOI: 10.1016/j.celrep.2023.113097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 06/21/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open
Abstract
Although phagocytic cells are documented targets of Leishmania parasites, it is unclear whether other cell types can be infected. Here, we use unbiased single-cell RNA sequencing (scRNA-seq) to simultaneously analyze host cell and Leishmania donovani transcriptomes to identify and annotate parasitized cells in spleen and bone marrow in chronically infected mice. Our dual-scRNA-seq methodology allows the detection of heterogeneous parasitized populations. In the spleen, monocytes and macrophages are the dominant parasitized cells, while megakaryocytes, basophils, and natural killer (NK) cells are found to be unexpectedly infected. In the bone marrow, the hematopoietic stem cells (HSCs) expressing phagocytic receptors FcγR and CD93 are the main parasitized cells. Additionally, we also detect parasitized cycling basal cells, eosinophils, and macrophages in chronically infected mice. Flow cytometric analysis confirms the presence of parasitized HSCs. Our unbiased dual-scRNA-seq method identifies rare, parasitized cells, potentially implicated in pathogenesis, persistence, and protective immunity, using a non-targeted approach.
Collapse
Affiliation(s)
| | - Sreenivas Gannavaram
- Division of Emerging and Transfusion Transmitted Diseases, CBER, FDA, Silver Spring, MD, USA
| | - Chaitenya Verma
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | | | - Parna Bhattacharya
- Division of Emerging and Transfusion Transmitted Diseases, CBER, FDA, Silver Spring, MD, USA
| | - Hira L Nakhasi
- Division of Emerging and Transfusion Transmitted Diseases, CBER, FDA, Silver Spring, MD, USA
| | - Abhay R Satoskar
- Department of Microbiology, The Ohio State University, Columbus, OH 43210, USA; Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA.
| |
Collapse
|
60
|
Wang T, Zhao H, Xu Y, Wang Y, Shang X, Peng J, Xiao B. scMultiGAN: cell-specific imputation for single-cell transcriptomes with multiple deep generative adversarial networks. Brief Bioinform 2023; 24:bbad384. [PMID: 37903416 PMCID: PMC11020228 DOI: 10.1093/bib/bbad384] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/13/2023] [Accepted: 10/03/2023] [Indexed: 11/01/2023] Open
Abstract
The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the identification of cell types and the study of cellular states at a single-cell level. Despite its significant potential, scRNA-seq data analysis is plagued by the issue of missing values. Many existing imputation methods rely on simplistic data distribution assumptions while ignoring the intrinsic gene expression distribution specific to cells. This work presents a novel deep-learning model, named scMultiGAN, for scRNA-seq imputation, which utilizes multiple collaborative generative adversarial networks (GAN). Unlike traditional GAN-based imputation methods that generate missing values based on random noises, scMultiGAN employs a two-stage training process and utilizes multiple GANs to achieve cell-specific imputation. Experimental results show the efficacy of scMultiGAN in imputation accuracy, cell clustering, differential gene expression analysis and trajectory analysis, significantly outperforming existing state-of-the-art techniques. Additionally, scMultiGAN is scalable to large scRNA-seq datasets and consistently performs well across sequencing platforms. The scMultiGAN code is freely available at https://github.com/Galaxy8172/scMultiGAN.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| | - Hui Zhao
- School of Automation, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| | - Yungang Xu
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi’an Jiaotong University Health Science Center, No.28, West Xianning Road, 710061 Xi’an, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| | - Bing Xiao
- School of Automation, Northwestern Polytechnical University, 1 Dongxiang Rd., 710072 Xi’an, China
| |
Collapse
|
61
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
62
|
Mao G, Pang Z, Zuo K, Wang Q, Pei X, Chen X, Liu J. Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks. Brief Bioinform 2023; 24:bbad414. [PMID: 37985457 PMCID: PMC10661972 DOI: 10.1093/bib/bbad414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/22/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.
Collapse
Affiliation(s)
- Guo Mao
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Zhengbin Pang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Ke Zuo
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Qinglin Wang
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xiangdong Pei
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Xinhai Chen
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
| | - Jie Liu
- Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China
- Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China
| |
Collapse
|
63
|
Zhang C, Duan ZW, Xu YP, Liu J, Li HD. FEED: a feature selection method based on gene expression decomposition for single cell clustering. Brief Bioinform 2023; 24:bbad389. [PMID: 37935617 DOI: 10.1093/bib/bbad389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/31/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023] Open
Abstract
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Collapse
Affiliation(s)
- Chao Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Zhi-Wei Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jin Liu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
64
|
Zhao J, Wong CW, Ching WK, Cheng X. NG-SEM: an effective non-Gaussian structural equation modeling framework for gene regulatory network inference from single-cell RNA-seq data. Brief Bioinform 2023; 24:bbad369. [PMID: 37864293 DOI: 10.1093/bib/bbad369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/25/2023] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
Inference of gene regulatory network (GRN) from gene expression profiles has been a central problem in systems biology and bioinformatics in the past decades. The tremendous emergency of single-cell RNA sequencing (scRNA-seq) data brings new opportunities and challenges for GRN inference: the extensive dropouts and complicated noise structure may also degrade the performance of contemporary gene regulatory models. Thus, there is an urgent need to develop more accurate methods for gene regulatory network inference in single-cell data while considering the noise structure at the same time. In this paper, we extend the traditional structural equation modeling (SEM) framework by considering a flexible noise modeling strategy, namely we use the Gaussian mixtures to approximate the complex stochastic nature of a biological system, since the Gaussian mixture framework can be arguably served as a universal approximation for any continuous distributions. The proposed non-Gaussian SEM framework is called NG-SEM, which can be optimized by iteratively performing Expectation-Maximization algorithm and weighted least-squares method. Moreover, the Akaike Information Criteria is adopted to select the number of components of the Gaussian mixture. To probe the accuracy and stability of our proposed method, we design a comprehensive variate of control experiments to systematically investigate the performance of NG-SEM under various conditions, including simulations and real biological data sets. Results on synthetic data demonstrate that this strategy can improve the performance of traditional Gaussian SEM model and results on real biological data sets verify that NG-SEM outperforms other five state-of-the-art methods.
Collapse
Affiliation(s)
- Jiaying Zhao
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Chi-Wing Wong
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Wai-Ki Ching
- Department of Mathematics, The University of Hongkong, Pokfulam road, Hong Kong
| | - Xiaoqing Cheng
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, ShaanXi, China
| |
Collapse
|
65
|
Li L, Xia R, Chen W, Zhao Q, Tao P, Chen L. Single-cell causal network inferred by cross-mapping entropy. Brief Bioinform 2023; 24:bbad281. [PMID: 37544659 DOI: 10.1093/bib/bbad281] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 07/03/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Gene regulatory networks (GRNs) reveal the complex molecular interactions that govern cell state. However, it is challenging for identifying causal relations among genes due to noisy data and molecular nonlinearity. Here, we propose a novel causal criterion, neighbor cross-mapping entropy (NME), for inferring GRNs from both steady data and time-series data. NME is designed to quantify 'continuous causality' or functional dependency from one variable to another based on their function continuity with varying neighbor sizes. NME shows superior performance on benchmark datasets, comparing with existing methods. By applying to scRNA-seq datasets, NME not only reliably inferred GRNs for cell types but also identified cell states. Based on the inferred GRNs and further their activity matrices, NME showed better performance in single-cell clustering and downstream analyses. In summary, based on continuous causality, NME provides a powerful tool in inferring causal regulations of GRNs between genes from scRNA-seq data, which is further exploited to identify novel cell types/states and predict cell type-specific network modules.
Collapse
Affiliation(s)
- Lin Li
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Rui Xia
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Zhao
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Tao
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
66
|
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet 2023; 19:e1010942. [PMID: 37703293 PMCID: PMC10519590 DOI: 10.1371/journal.pgen.1010942] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/25/2023] [Accepted: 08/29/2023] [Indexed: 09/15/2023] Open
Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
67
|
Lee S, Deng L, Wang Y, Wang K, Sartor MA, Wang XS. IndepthPathway: an integrated tool for in-depth pathway enrichment analysis based on single-cell sequencing data. Bioinformatics 2023; 39:btad325. [PMID: 37243667 PMCID: PMC10275909 DOI: 10.1093/bioinformatics/btad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 04/29/2023] [Accepted: 05/26/2023] [Indexed: 05/29/2023] Open
Abstract
MOTIVATION Single-cell sequencing enables exploring the pathways and processes of cells, and cell populations. However, there is a paucity of pathway enrichment methods designed to tolerate the high noise and low gene coverage of this technology. When gene expression data are noisy and signals are sparse, testing pathway enrichment based on the genes expression may not yield statistically significant results, which is particularly problematic when detecting the pathways enriched in less abundant cells that are vulnerable to disturbances. RESULTS In this project, we developed a Weighted Concept Signature Enrichment Analysis specialized for pathway enrichment analysis from single-cell transcriptomics (scRNA-seq). Weighted Concept Signature Enrichment Analysis took a broader approach for assessing the functional relations of pathway gene sets to differentially expressed genes, and leverage the cumulative signature of molecular concepts characteristic of the highly differentially expressed genes, which we termed as the universal concept signature, to tolerate the high noise and low coverage of this technology. We then incorporated Weighted Concept Signature Enrichment Analysis into an R package called "IndepthPathway" for biologists to broadly leverage this method for pathway analysis based on bulk and single-cell sequencing data. Through simulating technical variability and dropouts in gene expression characteristic of scRNA-seq as well as benchmarking on a real dataset of matched single-cell and bulk RNAseq data, we demonstrate that IndepthPathway presents outstanding stability and depth in pathway enrichment results under stochasticity of the data, thus will substantially improve the scientific rigor of the pathway analysis for single-cell sequencing data. AVAILABILITY AND IMPLEMENTATION The IndepthPathway R package is available through: https://github.com/wangxlab/IndepthPathway.
Collapse
Affiliation(s)
- Sanghoon Lee
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, United States
| | - Letian Deng
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15232, United States
| | - Yue Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15232, United States
| | - Kai Wang
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Maureen A Sartor
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Xiao-Song Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15232, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, United States
| |
Collapse
|
68
|
Lin Y, Wu TY, Chen X, Wan S, Chao B, Xin J, Yang JY, Wong WH, Wang YXR. scTIE: data integration and inference of gene regulation using single-cell temporal multimodal data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541381. [PMID: 37292801 PMCID: PMC10245711 DOI: 10.1101/2023.05.18.541381] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal datasets, we demonstrate scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome dataset we generated from differentiating mouse embryonic stem cells over time, we demonstrate scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Tung-Yu Wu
- Department of Statistics, Stanford University, CA, USA
| | - Xi Chen
- Department of Statistics, Stanford University, CA, USA
| | - Sheng Wan
- Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Brian Chao
- Department of Electrical Engineering, Stanford University, CA, USA
| | - Jingxue Xin
- Department of Statistics, Stanford University, CA, USA
| | - Jean Y.H. Yang
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Wing H. Wong
- Department of Statistics, Stanford University, CA, USA
- Department of Biomedical Data Science, Stanford University, CA, USA
- Bio-X Program, Stanford University, CA, USA
| | - Y. X. Rachel Wang
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
| |
Collapse
|
69
|
Liu H, Li H, Sharma A, Huang W, Pan D, Gu Y, Lin L, Sun X, Liu H. scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets. Brief Bioinform 2023; 24:bbad179. [PMID: 37183449 DOI: 10.1093/bib/bbad179] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/16/2023] Open
Abstract
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.
Collapse
Affiliation(s)
- Hongjia Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Huamei Li
- Department of General Surgery, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, PR China
| | - Amit Sharma
- Department of Neurosurgery, University Hospital Bonn, Bonn, Germany
| | | | - Duo Pan
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yu Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Lu Lin
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xiao Sun
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Hongde Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| |
Collapse
|
70
|
Wang Y, Xuan C, Wu H, Zhang B, Ding T, Gao J. P-CSN: single-cell RNA sequencing data analysis by partial cell-specific network. Brief Bioinform 2023; 24:bbad180. [PMID: 37170676 DOI: 10.1093/bib/bbad180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/14/2023] [Accepted: 04/19/2023] [Indexed: 05/13/2023] Open
Abstract
Although many single-cell computational methods proposed use gene expression as input, recent studies show that replacing 'unstable' gene expression with 'stable' gene-gene associations can greatly improve the performance of downstream analysis. To obtain accurate gene-gene associations, conditional cell-specific network method (c-CSN) filters out the indirect associations of cell-specific network method (CSN) based on the conditional independence of statistics. However, when there are strong connections in networks, the c-CSN suffers from false negative problem in network construction. To overcome this problem, a new partial cell-specific network method (p-CSN) based on the partial independence of statistics is proposed in this paper, which eliminates the singularity of the c-CSN by implicitly including direct associations among estimated variables. Based on the p-CSN, single-cell network entropy (scNEntropy) is further proposed to quantify cell state. The superiorities of our method are verified on several datasets. (i) Compared with traditional gene regulatory network construction methods, the p-CSN constructs partial cell-specific networks, namely, one cell to one network. (ii) When there are strong connections in networks, the p-CSN reduces the false negative probability of the c-CSN. (iii) The input of more accurate gene-gene associations further optimizes the performance of downstream analyses. (iv) The scNEntropy effectively quantifies cell state and reconstructs cell pseudo-time.
Collapse
Affiliation(s)
- Yan Wang
- School of Science, Jiangnan University, Wuxi 214122, China
| | - Chenxu Xuan
- School of Science, Jiangnan University, Wuxi 214122, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi 214122, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi 214122, China
| | - Tao Ding
- School of Mathematics Statistics and Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
71
|
Qiu Y, Yan C, Zhao P, Zou Q. SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data. Brief Bioinform 2023; 24:7147025. [PMID: 37122068 DOI: 10.1093/bib/bbad149] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. AVAILABILITY AND IMPLEMENTATION The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.
Collapse
Affiliation(s)
- Yushan Qiu
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Chang Yan
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610056, China
| |
Collapse
|
72
|
Zhang J, Shi G, Pang J, Zhu X, Feng Q, Na J, Ma W, Liu D, Songyang Z. Crotonylation of GAPDH regulates human embryonic stem cell endodermal lineage differentiation and metabolic switch. Stem Cell Res Ther 2023; 14:63. [PMID: 37013624 PMCID: PMC10071711 DOI: 10.1186/s13287-023-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 03/16/2023] [Indexed: 04/05/2023] Open
Abstract
BACKGROUND Post-translational modifications of proteins are crucial to the regulation of their activity and function. As a newly discovered acylation modification, crotonylation of non-histone proteins remains largely unexplored, particularly in human embryonic stem cells (hESCs). METHODS We investigated the role of crotonylation in hESC differentiation by introduce crotonate into the culture medium of GFP tagged LTR7 primed H9 cell and extended pluripotent stem cell lines. RNA-seq assay was used to determine the hESC transcriptional features. Through morphological changes, qPCR of pluripotent and germ layer-specific gene markers and flow cytometry analysis, we determined that the induced crotonylation resulted in hESC differentiating into the endodermal lineage. We performed targeted metabolomic analysis and seahorse metabolic measurement to investigate the metabolism features after crotonate induction. Then high-resolution tandem mass spectrometry (LC-MS/MS) revealed the target proteins in hESCs. In addition, the role of crotonylated glycolytic enzymes (GAPDH and ENOA) was evaluated by in vitro crotonylation and enzymatic activity assays. Finally, we used knocked-down hESCs by shRNA, wild GAPDH and GAPDH mutants to explore potential role of GAPDH crotonylation in regulating human embryonic stem cell differentiation and metabolic switch. RESULT We found that induced crotonylation in hESCs resulted in hESCs of different pluripotency states differentiating into the endodermal lineage. Increased protein crotonylation in hESCs was accompanied by transcriptomic shifts and decreased glycolysis. Large-scale crotonylation profiling of non-histone proteins revealed that metabolic enzymes were major targets of inducible crotonylation in hESCs. We further discovered GAPDH as a key glycolytic enzyme regulated by crotonylation during endodermal differentiation from hESCs. CONCLUSIONS Crotonylation of GAPDH decreased its enzymatic activity thereby leading to reduced glycolysis during endodermal differentiation from hESCs.
Collapse
Affiliation(s)
- Jingran Zhang
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Guang Shi
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
| | - Junjie Pang
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Xing Zhu
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Qingcai Feng
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Jie Na
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Wenbin Ma
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Dan Liu
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Zhou Songyang
- MOE Key Laboratory of Gene Function and Regulation, Guangzhou Key Laboratory of Healthy Aging Research and SYSU-BCM Joint Research Center, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
- Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510120, China.
- Bioland Laboratory, Guangzhou, 510320, China.
| |
Collapse
|
73
|
Reagor CC, Velez-Angel N, Hudspeth AJ. Depicting pseudotime-lagged causality across single-cell trajectories for accurate gene-regulatory inference. PNAS NEXUS 2023; 2:pgad113. [PMID: 37113980 PMCID: PMC10129065 DOI: 10.1093/pnasnexus/pgad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/21/2023] [Accepted: 03/23/2023] [Indexed: 04/29/2023]
Abstract
Identifying the causal interactions in gene-regulatory networks requires an accurate understanding of the time-lagged relationships between transcription factors and their target genes. Here we describe DELAY (short for Depicting Lagged Causality), a convolutional neural network for the inference of gene-regulatory relationships across pseudotime-ordered single-cell trajectories. We show that combining supervised deep learning with joint probability matrices of pseudotime-lagged trajectories allows the network to overcome important limitations of ordinary Granger causality-based methods, for example, the inability to infer cyclic relationships such as feedback loops. Our network outperforms several common methods for inferring gene regulation and, when given partial ground-truth labels, predicts novel regulatory networks from single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data sets. To validate this approach, we used DELAY to identify important genes and modules in the regulatory network of auditory hair cells, as well as likely DNA-binding partners for two hair cell cofactors (Hist1h1c and Ccnd1) and a novel binding sequence for the hair cell-specific transcription factor Fiz1. We provide an easy-to-use implementation of DELAY under an open-source license at https://github.com/calebclayreagor/DELAY.
Collapse
Affiliation(s)
| | - Nicolas Velez-Angel
- Howard Hughes Medical Institute and Laboratory of Sensory Neuroscience, The Rockefeller University, New York, NY 10065, USA
| | | |
Collapse
|
74
|
Shen B, Coruzzi G, Shasha D. EnsInfer: a simple ensemble approach to network inference outperforms any single method. BMC Bioinformatics 2023; 24:114. [PMID: 36964499 PMCID: PMC10037858 DOI: 10.1186/s12859-023-05231-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/15/2023] [Indexed: 03/26/2023] Open
Abstract
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
Collapse
Affiliation(s)
- Bingran Shen
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| | - Gloria Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003 USA
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| |
Collapse
|
75
|
Huang Z, Wang J, Lu X, Mohd Zain A, Yu G. scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network. Brief Bioinform 2023; 24:7024714. [PMID: 36733262 DOI: 10.1093/bib/bbad040] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/21/2022] [Accepted: 01/18/2023] [Indexed: 02/04/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data are typically with a large number of missing values, which often results in the loss of critical gene signaling information and seriously limit the downstream analysis. Deep learning-based imputation methods often can better handle scRNA-seq data than shallow ones, but most of them do not consider the inherent relations between genes, and the expression of a gene is often regulated by other genes. Therefore, it is essential to impute scRNA-seq data by considering the regional gene-to-gene relations. We propose a novel model (named scGGAN) to impute scRNA-seq data that learns the gene-to-gene relations by Graph Convolutional Networks (GCN) and global scRNA-seq data distribution by Generative Adversarial Networks (GAN). scGGAN first leverages single-cell and bulk genomics data to explore inherent relations between genes and builds a more compact gene relation network to jointly capture the homogeneous and heterogeneous information. Then, it constructs a GCN-based GAN model to integrate the scRNA-seq, gene sequencing data and gene relation network for generating scRNA-seq data, and trains the model through adversarial learning. Finally, it utilizes data generated by the trained GCN-based GAN model to impute scRNA-seq data. Experiments on simulated and real scRNA-seq datasets show that scGGAN can effectively identify dropout events, recover the biologically meaningful expressions, determine subcellular states and types, improve the differential expression analysis and temporal dynamics analysis. Ablation experiments confirm that both the gene relation network and gene sequence data help the imputation of scRNA-seq data.
Collapse
Affiliation(s)
- Zimo Huang
- MEng student at School of Software, Shandong University, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
| | - Xudong Lu
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
| | | | - Guoxian Yu
- School of Software, Shandong University, China
| |
Collapse
|
76
|
Chen HH, Hsueh CW, Lee CH, Hao TY, Tu TY, Chang LY, Lee JC, Lin CY. SWEET: a single-sample network inference method for deciphering individual features in disease. Brief Bioinform 2023; 24:7017366. [PMID: 36719112 PMCID: PMC10025435 DOI: 10.1093/bib/bbad032] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 01/05/2023] [Accepted: 01/14/2023] [Indexed: 02/01/2023] Open
Abstract
Recently, extracting inherent biological system information (e.g. cellular networks) from genome-wide expression profiles for developing personalized diagnostic and therapeutic strategies has become increasingly important. However, accurately constructing single-sample networks (SINs) to capture individual characteristics and heterogeneity in disease remains challenging. Here, we propose a sample-specific-weighted correlation network (SWEET) method to model SINs by integrating the genome-wide sample-to-sample correlation (i.e. sample weights) with the differential network between perturbed and aggregate networks. For a group of samples, the genome-wide sample weights can be assessed without prior knowledge of intrinsic subpopulations to address the network edge number bias caused by sample size differences. Compared with the state-of-the-art SIN inference methods, the SWEET SINs in 16 cancers more likely fit the scale-free property, display higher overlap with the human interactomes and perform better in identifying three types of cancer-related genes. Moreover, integrating SWEET SINs with a network proximity measure facilitates characterizing individual features and therapy in diseases, such as somatic mutation, mut-driver and essential genes. Biological experiments further validated two candidate repurposable drugs, albendazole for head and neck squamous cell carcinoma (HNSCC) and lung adenocarcinoma (LUAD) and encorafenib for HNSCC. By applying SWEET, we also identified two possible LUAD subtypes that exhibit distinct clinical features and molecular mechanisms. Overall, the SWEET method complements current SIN inference and analysis methods and presents a view of biological systems at the network level to offer numerous clues for further investigation and clinical translation in network medicine and precision medicine.
Collapse
Affiliation(s)
- Hsin-Hua Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chun-Wei Hsueh
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Ting-Yi Hao
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Ying Tu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
77
|
Novakovsky G, Sasaki S, Fornes O, Omur ME, Huang H, Bayly CL, Zhang D, Lim N, Cherkasov A, Pavlidis P, Mostafavi S, Lynn FC, Wasserman WW. In silico discovery of small molecules for efficient stem cell differentiation into definitive endoderm. Stem Cell Reports 2023; 18:765-781. [PMID: 36801003 PMCID: PMC10031281 DOI: 10.1016/j.stemcr.2023.01.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 01/18/2023] [Accepted: 01/19/2023] [Indexed: 02/18/2023] Open
Abstract
Improving methods for human embryonic stem cell differentiation represents a challenge in modern regenerative medicine research. Using drug repurposing approaches, we discover small molecules that regulate the formation of definitive endoderm. Among them are inhibitors of known processes involved in endoderm differentiation (mTOR, PI3K, and JNK pathways) and a new compound, with an unknown mechanism of action, capable of inducing endoderm formation in the absence of growth factors in the media. Optimization of the classical protocol by inclusion of this compound achieves the same differentiation efficiency with a 90% cost reduction. The presented in silico procedure for candidate molecule selection has broad potential for improving stem cell differentiation protocols.
Collapse
Affiliation(s)
- Gherman Novakovsky
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada; Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Shugo Sasaki
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Department of Surgery, University of British Columbia, Vancouver, BC, Canada; School of Biomedical Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Oriol Fornes
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Meltem E Omur
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada; Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Helen Huang
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Department of Surgery, University of British Columbia, Vancouver, BC, Canada; School of Biomedical Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Carmen L Bayly
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| | - Dahai Zhang
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Nathaniel Lim
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada; Department of Psychiatry, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Artem Cherkasov
- Department of Urological Sciences, Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada
| | - Paul Pavlidis
- Department of Psychiatry, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Sara Mostafavi
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Department of Statistics, University of British Columbia, Vancouver, BC, Canada; Department of Computer Science, University of Washington, Seattle, WA, USA
| | - Francis C Lynn
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Department of Surgery, University of British Columbia, Vancouver, BC, Canada; School of Biomedical Engineering, University of British Columbia, Vancouver, BC, Canada.
| | - Wyeth W Wasserman
- BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada; Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
78
|
Models of Congenital Adrenal Hyperplasia for Gene Therapies Testing. Int J Mol Sci 2023; 24:ijms24065365. [PMID: 36982440 PMCID: PMC10049562 DOI: 10.3390/ijms24065365] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 02/26/2023] [Accepted: 03/07/2023] [Indexed: 03/14/2023] Open
Abstract
The adrenal glands are important endocrine organs that play a major role in the stress response. Some adrenal glands abnormalities are treated with hormone replacement therapy, which does not address physiological requirements. Modern technologies make it possible to develop gene therapy drugs that can completely cure diseases caused by mutations in specific genes. Congenital adrenal hyperplasia (CAH) is an example of such a potentially treatable monogenic disease. CAH is an autosomal recessive inherited disease with an overall incidence of 1:9500–1:20,000 newborns. To date, there are several promising drugs for CAH gene therapy. At the same time, it remains unclear how new approaches can be tested, as there are no models for this disease. The present review focuses on modern models for inherited adrenal gland insufficiency and their detailed characterization. In addition, the advantages and disadvantages of various pathological models are discussed, and ways of further development are suggested.
Collapse
|
79
|
Li B, Jin K, Ou-Yang L, Yan H, Zhang XF. scTSSR2: Imputing Dropout Events for Single-Cell RNA Sequencing Using Fast Two-Side Self-Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1445-1456. [PMID: 35476574 DOI: 10.1109/tcbb.2022.3170587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The single-cell RNA sequencing (scRNA-seq) technique begins a new era by revealing gene expression patterns at single-cell resolution, enabling studies of heterogeneity and transcriptome dynamics of complex tissues at single-cell resolution. However, existing large proportion of dropout events may hinder downstream analyses. Thus imputation of dropout events is an important step in analyzing scRNA-seq data. We develop scTSSR2, a new imputation method that combines matrix decomposition with the previously developed two-side sparse self-representation, leading to fast two-side sparse self-representation to impute dropout events in scRNA-seq data. The comparisons of computational speed and memory usage among different imputation methods show that scTSSR2 has distinct advantages in terms of computational speed and memory usage. Comprehensive downstream experiments show that scTSSR2 outperforms the state-of-the-art imputation methods. A user-friendly R package scTSSR2 is developed to denoise the scRNA-seq data to improve the data quality.
Collapse
|
80
|
Wang Y, Liu C, Qiao X, Han X, Liu ZP. PKI: A bioinformatics method of quantifying the importance of nodes in gene regulatory network via a pseudo knockout index. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2023; 1866:194911. [PMID: 36804477 DOI: 10.1016/j.bbagrm.2023.194911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/09/2023] [Accepted: 01/30/2023] [Indexed: 02/18/2023]
Abstract
BACKGROUND Gene regulatory network (GRN) is a model that characterizes the complex relationships between genes and thereby provides an informatics environment to measure the importance of nodes. The evaluation of important nodes in a GRN can effectively refer to their functional implications severing as key players in particular biological processes, such as master regulator and driver gene. Currently, it is mainly based on network topological parameters and focuses only on evaluating a single node individually. However, genes and products play their functions by interacting with each other. It is worth noting that the effects of gene combinations in GRN are not simply additive. Key combinations discovery is of significance in revealing gene sets with important functions. Recently, with the development of single-cell RNA-sequencing (scRNA-seq) technology, we can quantify gene expression profiles of individual cells that provide the potential to identify crucial nodes in gene regulations regarding specific condition, e.g., stem cell differentiation. RESULTS In this paper, we propose a bioinformatics method, called Pseudo Knockout Importance (PKI), to quantify the importance of node and node sets in a specific GRN structure using time-course scRNA-seq data. First, we construct ordinary differential equations to approach the gene regulations during cell differentiation. Then we design gene pseudo knockout experiments and define PKI score evaluation criteria based on the coefficient of determination. The importance of nodes can be described as the influence on the ODE system of removing variables. For key gene combinations, PKI is derived as a combinatorial optimization problem of quantifying the in silico gene knockout effects. CONCLUSIONS Here, we focus our analyses on the specific GRN of embryonic stem cells with time series gene expression profile. To verify the effectiveness and advantage of PKI method, we compare its node importance rankings with other twelve kinds of centrality-based methods, such as degree and Latora closeness. For key node combinations, we compare the results with the method based on minimum dominant set. Moreover, the famous combinations of transcription factors in induced pluripotent stem cell are also employed to verify the vital gene combinations identified by PKI. These results demonstrate the reliability and superiority of the proposed method.
Collapse
Affiliation(s)
- Yijuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Chao Liu
- Department of Orthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi 753-8511, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
81
|
Yu Z, Su Y, Lu Y, Yang Y, Wang F, Zhang S, Chang Y, Wong KC, Li X. Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA. Nat Commun 2023; 14:400. [PMID: 36697410 PMCID: PMC9877026 DOI: 10.1038/s41467-023-36134-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 01/16/2023] [Indexed: 01/26/2023] Open
Abstract
Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.
Collapse
Affiliation(s)
- Zhuohan Yu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yanchi Su
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yifu Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China.
| |
Collapse
|
82
|
Lin Z, Ou-Yang L. Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning. Brief Bioinform 2023; 24:6965907. [PMID: 36585783 DOI: 10.1093/bib/bbac586] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 01/01/2023] Open
Abstract
The inference of gene regulatory networks (GRNs) is of great importance for understanding the complex regulatory mechanisms within cells. The emergence of single-cell RNA-sequencing (scRNA-seq) technologies enables the measure of gene expression levels for individual cells, which promotes the reconstruction of GRNs at single-cell resolution. However, existing network inference methods are mainly designed for data collected from a single data source, which ignores the information provided by multiple related data sources. In this paper, we propose a multi-view contrastive learning (DeepMCL) model to infer GRNs from scRNA-seq data collected from multiple data sources or time points. We first represent each gene pair as a set of histogram images, and then introduce a deep Siamese convolutional neural network with contrastive loss to learn the low-dimensional embedding for each gene pair. Moreover, an attention mechanism is introduced to integrate the embeddings extracted from different data sources and different neighbor gene pairs. Experimental results on synthetic and real-world datasets validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.
Collapse
Affiliation(s)
- Zerun Lin
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| |
Collapse
|
83
|
Deng W, Li B, Wang J, Jiang W, Yan X, Li N, Vukmirovic M, Kaminski N, Wang J, Zhao H. A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy. Brief Bioinform 2023; 24:bbac616. [PMID: 36631398 PMCID: PMC9851324 DOI: 10.1093/bib/bbac616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/14/2022] [Indexed: 01/13/2023] Open
Abstract
Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.
Collapse
Affiliation(s)
- Wenxuan Deng
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Bolun Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Jiawei Wang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Xiting Yan
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Ningshan Li
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Milica Vukmirovic
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jing Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| |
Collapse
|
84
|
Zhi Y, Li M, Lv G. Into the multi-omics era: Progress of T cells profiling in the context of solid organ transplantation. Front Immunol 2023; 14:1058296. [PMID: 36798139 PMCID: PMC9927650 DOI: 10.3389/fimmu.2023.1058296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
T cells are the common type of lymphocyte to mediate allograft rejection, remaining long-term allograft survival impeditive. However, the heterogeneity of T cells, in terms of differentiation and activation status, the effector function, and highly diverse T cell receptors (TCRs) have thus precluded us from tracking these T cells and thereby comprehending their fate in recipients due to the limitations of traditional detection approaches. Recently, with the widespread development of single-cell techniques, the identification and characterization of T cells have been performed at single-cell resolution, which has contributed to a deeper comprehension of T cell heterogeneity by relevant detections in a single cell - such as gene expression, DNA methylation, chromatin accessibility, surface proteins, and TCR. Although these approaches can provide valuable insights into an individual cell independently, a comprehensive understanding can be obtained when applied joint analysis. Multi-omics techniques have been implemented in characterizing T cells in health and disease, including transplantation. This review focuses on the thesis, challenges, and advances in these technologies and highlights their application to the study of alloreactive T cells to improve the understanding of T cell heterogeneity in solid organ transplantation.
Collapse
Affiliation(s)
- Yao Zhi
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Mingqian Li
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Guoyue Lv
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| |
Collapse
|
85
|
Dynamic network biomarker factors orchestrate cell-fate determination at tipping points during hESC differentiation. Innovation (N Y) 2022; 4:100364. [PMID: 36632190 PMCID: PMC9827382 DOI: 10.1016/j.xinn.2022.100364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
The generation of ectoderm, mesoderm, and endoderm layers is the most critical biological process during the gastrulation of embryo development. Such a differentiation process in human embryonic stem cells (hESCs) is an inherently nonlinear multi-stage dynamical process which contain multiple tipping points playing crucial roles in the cell-fate decision. However, the tipping points of the process are largely unknown, letting alone the understanding of the molecular regulation on these critical events. Here by designing a module-based dynamic network biomarker (M-DNB) model, we quantitatively pinpointed two tipping points of the differentiation of hESCs toward definitive endoderm, which leads to the identification of M-DNB factors (FOS, HSF1, MYCN, TP53, and MYC) of this process. We demonstrate that before the tipping points, M-DNB factors are able to maintain the cell states and orchestrate cell-fate determination during hESC (ES)-to-ME and ME-to-DE differentiation processes, which not only leads to better understanding of endodermal specification of hESCs but also reveals the power of the M-DNB model to identify critical transition points with their key factors in diverse biological processes, including cell differentiation and transdifferentiation dynamics.
Collapse
|
86
|
Chen Y, Zhang H, Sun X. Improving the performance of single-cell RNA-seq data mining based on relative expression orderings. Brief Bioinform 2022; 24:6931720. [PMID: 36528803 PMCID: PMC9851298 DOI: 10.1093/bib/bbac556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/16/2022] [Indexed: 12/23/2022] Open
Abstract
The advent of single-cell RNA-sequencing (scRNA-seq) provides an unprecedented opportunity to explore gene expression profiles at the single-cell level. However, gene expression values vary over time and under different conditions even within the same cell. There is an urgent need for more stable and reliable feature variables at the single-cell level to depict cell heterogeneity. Thus, we construct a new feature matrix called the delta rank matrix (DRM) from scRNA-seq data by integrating an a priori gene interaction network, which transforms the unreliable gene expression value into a stable gene interaction/edge value on a single-cell basis. This is the first time that a gene-level feature has been transformed into an interaction/edge-level for scRNA-seq data analysis based on relative expression orderings. Experiments on various scRNA-seq datasets have demonstrated that DRM performs better than the original gene expression matrix in cell clustering, cell identification and pseudo-trajectory reconstruction. More importantly, the DRM really achieves the fusion of gene expressions and gene interactions and provides a method of measuring gene interactions at the single-cell level. Thus, the DRM can be used to find changes in gene interactions among different cell types, which may open up a new way to analyze scRNA-seq data from an interaction perspective. In addition, DRM provides a new method to construct a cell-specific network for each single cell instead of a group of cells as in traditional network construction methods. DRM's exceptional performance is due to its extraction of rich gene-association information on biological systems and stable characterization of cells.
Collapse
Affiliation(s)
- Yuanyuan Chen
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China,College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiao Sun
- Corresponding author: Xiao Sun, State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China. Tel: +8613951989906; E-mail:
| |
Collapse
|
87
|
Wu F, Liufu Z, Liu Y, Guo L, Wu J, Cao S, Qin Y, Guo N, Fu Y, Liu H, Li Q, Shu X, Pei D, Hutchins AP, Chen J, He J. Species-specific rewiring of definitive endoderm developmental gene activation via endogenous retroviruses through TET1-mediated demethylation. Cell Rep 2022; 41:111791. [PMID: 36516776 DOI: 10.1016/j.celrep.2022.111791] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 10/03/2022] [Accepted: 11/15/2022] [Indexed: 12/15/2022] Open
Abstract
Transposable elements (TEs) are the major sources of lineage-specific genomic innovation and comprise nearly half of the human genome, but most of their functions remain unclear. Here, we identify that a series of endogenous retroviruses (ERVs), a TE subclass, regulate the transcriptome at the definitive endoderm stage with in vitro differentiation model from human embryonic stem cell. Notably, these ERVs perform as enhancers containing binding sites for critical transcription factors for endoderm lineage specification. Genome-wide methylation analysis shows most of these ERVs are derepressed by TET1-mediated DNA demethylation. LTR6B, a representative definitive endoderm activating ERV, contains binding sites for FOXA2 and GATA4 and governs the primate-specific expression of its neighboring developmental genes such as ERBB4 in definitive endoderm. Together, our study proposes evidence that recently evolved ERVs represent potent de novo developmental regulatory elements, which, in turn, fine-tune species-specific transcriptomes during endoderm and embryonic development.
Collapse
Affiliation(s)
- Fang Wu
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Zhongqi Liufu
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Yujian Liu
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Lin Guo
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Jian Wu
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Shangtao Cao
- Guangzhou Laboratory, Bio-island, Guangzhou 510320, China
| | - Yue Qin
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Ning Guo
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yunyun Fu
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - He Liu
- Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510320, China
| | - Qiuhong Li
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Xiaodong Shu
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Duanqing Pei
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, Westlake University, Hangzhou, China
| | - Andrew P Hutchins
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jiekai Chen
- Center for Cell Lineage and Development, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; University of the Chinese Academy of Sciences, Beijing 100049, China.
| | - Jiangping He
- Guangzhou Laboratory, Bio-island, Guangzhou 510320, China.
| |
Collapse
|
88
|
Zhong J, Han C, Wang Y, Chen P, Liu R. Identifying the critical state of complex biological systems by the directed-network rank score method. Bioinformatics 2022; 38:5398-5405. [PMID: 36282843 PMCID: PMC9750123 DOI: 10.1093/bioinformatics/btac707] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 09/21/2022] [Accepted: 10/24/2022] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Catastrophic transitions are ubiquitous in the dynamic progression of complex biological systems; that is, a critical transition at which complex systems suddenly shift from one stable state to another occurs. Identifying such a critical point or tipping point is essential for revealing the underlying mechanism of complex biological systems. However, it is difficult to identify the tipping point since few significant differences in the critical state are detected in terms of traditional static measurements. RESULTS In this study, by exploring the dynamic changes in gene cooperative effects between the before-transition and critical states, we presented a model-free approach, the directed-network rank score (DNRS), to detect the early-warning signal of critical transition in complex biological systems. The proposed method is applicable to both bulk and single-cell RNA-sequencing (scRNA-seq) data. This computational method was validated by the successful identification of the critical or pre-transition state for both simulated and six real datasets, including three scRNA-seq datasets of embryonic development and three tumor datasets. In addition, the functional and pathway enrichment analyses suggested that the corresponding DNRS signaling biomarkers were involved in key biological processes. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/zhongjiayuan/DNRS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiayuan Zhong
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
- School of Mathematics, South China University of Technology, Guangzhou 510640, China
| | - Chongyin Han
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510640, China
| | - Yangkai Wang
- School of Mathematics, South China University of Technology, Guangzhou 510640, China
| | - Pei Chen
- School of Mathematics, South China University of Technology, Guangzhou 510640, China
| | - Rui Liu
- School of Mathematics, South China University of Technology, Guangzhou 510640, China
- Pazhou Lab, Guangzhou 510330, China
| |
Collapse
|
89
|
Hypergraph geometry reflects higher-order dynamics in protein interaction networks. Sci Rep 2022; 12:20879. [PMID: 36463292 PMCID: PMC9719542 DOI: 10.1038/s41598-022-24584-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022] Open
Abstract
Protein interactions form a complex dynamic molecular system that shapes cell phenotype and function; in this regard, network analysis is a powerful tool for studying the dynamics of cellular processes. Current models of protein interaction networks are limited in that the standard graph model can only represent pairwise relationships. Higher-order interactions are well-characterized in biology, including protein complex formation and feedback or feedforward loops. These higher-order relationships are better represented by a hypergraph as a generalized network model. Here, we present an approach to analyzing dynamic gene expression data using a hypergraph model and quantify network heterogeneity via Forman-Ricci curvature. We observe, on a global level, increased network curvature in pluripotent stem cells and cancer cells. Further, we use local curvature to conduct pathway analysis in a melanoma dataset, finding increased curvature in several oncogenic pathways and decreased curvature in tumor suppressor pathways. We compare this approach to a graph-based model and a differential gene expression approach.
Collapse
|
90
|
Tsukamoto M, Kimura K, Yoshida T, Sugiura K, Hatoya S. Canine induced pluripotent stem cells efficiently differentiate into definitive endoderm in 3D cell culture conditions using high-dose activin A. Regen Ther 2022; 21:502-510. [DOI: 10.1016/j.reth.2022.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 09/09/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022] Open
|
91
|
Yan H, Wu J, Li Y, Liu JS. Bayesian bi-clustering methods with applications in computational biology. Ann Appl Stat 2022. [DOI: 10.1214/22-aoas1622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Han Yan
- Department of Statistics, Harvard University
| | | | | | - Jun S. Liu
- Department of Statistics, Harvard University
| |
Collapse
|
92
|
Gu H, Cheng H, Ma A, Li Y, Wang J, Xu D, Ma Q. scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data. Bioinformatics 2022; 38:5322-5325. [PMID: 36250784 PMCID: PMC9710550 DOI: 10.1093/bioinformatics/btac684] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 08/30/2022] [Accepted: 10/14/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Gene expression imputation has been an essential step of the single-cell RNA-Seq data analysis workflow. Among several deep-learning methods, the debut of scGNN gained substantial recognition in 2021 for its superior performance and the ability to produce a cell-cell graph. However, the implementation of scGNN was relatively time-consuming and its performance could still be optimized. RESULTS The implementation of scGNN 2.0 is significantly faster than scGNN thanks to a simplified close-loop architecture. For all eight datasets, cell clustering performance was increased by 85.02% on average in terms of adjusted rand index, and the imputation Median L1 Error was reduced by 67.94% on average. With the built-in visualizations, users can quickly assess the imputation and cell clustering results, compare against benchmarks and interpret the cell-cell interaction. The expanded input and output formats also pave the way for custom workflows that integrate scGNN 2.0 with other scRNA-Seq toolkits on both Python and R platforms. AVAILABILITY AND IMPLEMENTATION scGNN 2.0 is implemented in Python (as of version 3.8) with the source code available at https://github.com/OSU-BMBL/scGNN2.0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haocheng Gu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Hao Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
93
|
Guan J, Wang Y, Wang Y, Zhuang Y, Ji G. SRGS: sparse partial least squares-based recursive gene selection for gene regulatory network inference. BMC Genomics 2022; 23:782. [PMID: 36451086 PMCID: PMC9710113 DOI: 10.1186/s12864-022-09020-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/16/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. RESULTS We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. CONCLUSIONS It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .
Collapse
Affiliation(s)
- Jinting Guan
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China
| | - Yang Wang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Yongjie Wang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Yan Zhuang
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China
| | - Guoli Ji
- grid.12955.3a0000 0001 2264 7233Department of Automation, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China
| |
Collapse
|
94
|
Wang Y, Lee H, Fear JM, Berger I, Oliver B, Przytycka TM. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
Affiliation(s)
- Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN, 47408, USA.
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin M Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Isabelle Berger
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
95
|
Xu Y, Chen J, Lyu A, Cheung WK, Zhang L. dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data. Brief Bioinform 2022; 23:6720420. [PMID: 36168811 DOI: 10.1093/bib/bbac424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/02/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022] Open
Abstract
Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.
Collapse
Affiliation(s)
- Yu Xu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Jiaxing Chen
- Computer Science and Technology, Division of Science and Technology, BNU-HKBU United International College, Jintong Road, 519087, Zhuhai, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
96
|
Ni X, Geng B, Zheng H, Shi J, Hu G, Gao J. Accurate Estimation of Single-Cell Differentiation Potency Based on Network Topology and Gene Ontology Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3255-3262. [PMID: 34529570 DOI: 10.1109/tcbb.2021.3112951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One important task in single-cell analysis is to quantify the differentiation potential of single cells. Though various single-cell potency measures have been proposed, they are based on individual biological sources, thus not robust and reliable. It is still a challenge to combine multiple sources to generate a relatively reliable and robust measure to estimate differentiation. In this paper, we propose a New Centrality measure with Gene ontology information (NCG) to estimate single-cell potency. NCG is designed by combining network topology property with edge clustering coefficient, and gene function information using gene ontology function similarity scores. NCG distinguishes pluripotent cells from non-pluripotent cells with high accuracy, correctly ranks different cell types by their differentiation potency, tracks changes during the differentiation process, and constructs the lineage trajectory from human myoblasts into skeletal muscle cells. These indicate that NCG is a reliable and robust measure to estimate single-cell potency. NCG is anticipated to be a useful tool for identifying novel stem or progenitor cell phenotypes from single-cell RNA-Seq data. The source codes and datasets are available at https://github.com/Xinzhe-Ni/NCG.
Collapse
|
97
|
Shen X, Li M, Wang C, Liu Z, Wu K, Wang A, Bi C, Lu S, Long H, Zhu G. Hypoxia is fine-tuned by Hif-1α and regulates mesendoderm differentiation through the Wnt/β-Catenin pathway. BMC Biol 2022; 20:219. [PMID: 36199093 PMCID: PMC9536055 DOI: 10.1186/s12915-022-01423-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/28/2022] [Indexed: 11/10/2022] Open
Abstract
Background Hypoxia naturally happens in embryogenesis and thus serves as an important environmental factor affecting embryo development. Hif-1α, an essential hypoxia response factor, was mostly considered to mediate or synergistically regulate the effect of hypoxia on stem cells. However, the function and relationship of hypoxia and Hif-1α in regulating mesendoderm differentiation remains controversial. Results We here discovered that hypoxia dramatically suppressed the mesendoderm differentiation and promoted the ectoderm differentiation of mouse embryonic stem cells (mESCs). However, hypoxia treatment after mesendoderm was established promoted the downstream differentiation of mesendoderm-derived lineages. These effects of hypoxia were mediated by the repression of the Wnt/β-Catenin pathway and the Wnt/β-Catenin pathway was at least partially regulated by the Akt/Gsk3β axis. Blocking the Wnt/β-Catenin pathway under normoxia using IWP2 mimicked the effects of hypoxia while activating the Wnt/β-Catenin pathway with CHIR99021 fully rescued the mesendoderm differentiation suppression caused by hypoxia. Unexpectedly, Hif-1α overexpression, in contrast to hypoxia, promoted mesendoderm differentiation and suppressed ectoderm differentiation. Knockdown of Hif-1α under normoxia and hypoxia both inhibited the mesendoderm differentiation. Moreover, hypoxia even suppressed the mesendoderm differentiation of Hif-1α knockdown mESCs, further implying that the effects of hypoxia on the mesendoderm differentiation were Hif-1α independent. Consistently, the Wnt/β-Catenin pathway was enhanced by Hif-1α overexpression and inhibited by Hif-1α knockdown. As shown by RNA-seq, unlike hypoxia, the effect of Hif-1α was relatively mild and selectively regulated part of hypoxia response genes, which fine-tuned the effect of hypoxia on mESC differentiation. Conclusions This study revealed that hypoxia is fine-tuned by Hif-1α and regulates the mesendoderm and ectoderm differentiation by manipulating the Wnt/β-Catenin pathway, which contributed to the understanding of hypoxia-mediated regulation of development. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01423-y.
Collapse
Affiliation(s)
- Xiaopeng Shen
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China. .,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China. .,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.
| | - Meng Li
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Chunguang Wang
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Zhongxian Liu
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Kun Wu
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, 266003, Shandong, China
| | - Ao Wang
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Chao Bi
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Shan Lu
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| | - Hongan Long
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, 266003, Shandong, China
| | - Guoping Zhu
- Anhui Provincial Key Laboratory of Molecular Enzymology and Mechanism of Major Diseases, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China.,Key Laboratory of Biomedicine in Gene Diseases and Health of Anhui Higher Education Institutes, College of Life Sciences, Anhui Normal University, Wuhu, 241000, Anhui, China
| |
Collapse
|
98
|
Chen M, Xu C, Xu Z, He W, Zhang H, Su J, Song Q. Uncovering the dynamic effects of DEX treatment on lung cancer by integrating bioinformatic inference and multiscale modeling of scRNA-seq and proteomics data. Comput Biol Med 2022; 149:105999. [PMID: 35998480 PMCID: PMC9717711 DOI: 10.1016/j.compbiomed.2022.105999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 06/16/2022] [Accepted: 08/14/2022] [Indexed: 11/18/2022]
Abstract
Lung cancer is one of the leading causes of cancer-related death, with a five-year survival rate of 18%. It is a priority for us to understand the underlying mechanisms affecting lung cancer therapeutics' implementation and effectiveness. In this study, we combine the power of Bioinformatics and Systems Biology to comprehensively uncover functional and signaling pathways of drug treatment using bioinformatics inference and multiscale modeling of both scRNA-seq data and proteomics data. Based on a time series of lung adenocarcinoma derived A549 cells after DEX treatment, we first identified the differentially expressed genes (DEGs) in those lung cancer cells. Through the interrogation of regulatory network of those DEGs, we identified key hub genes including TGFβ, MYC, and SMAD3 varied underlie DEX treatment. Further gene set enrichment analysis revealed the TGFβ signaling pathway as the top enriched term. Those genes involved in the TGFβ pathway and their crosstalk with the ERBB pathway presented a strong survival prognosis in clinical lung cancer samples. With the basis of biological validation and literature-based curation, a multiscale model of tumor regulation centered on both TGFβ-induced and ERBB-amplified signaling pathways was developed to characterize the dynamic effects of DEX therapy on lung cancer cells. Our simulation results were well matched to available data of SMAD2, FOXO3, TGFβ1, and TGFβR1 over the time course. Moreover, we provided predictions of different doses to illustrate the trend and therapeutic potential of DEX treatment. The innovative and cross-disciplinary approach can be further applied to other computational studies in tumorigenesis and oncotherapy. We released the approach as a user-friendly tool named BIMM (Bioinformatic Inference and Multiscale Modeling), with all the key features available at https://github.com/chenm19/BIMM.
Collapse
Affiliation(s)
- Minghan Chen
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA
| | - Chunrui Xu
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Ziang Xu
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA; Department of Chemistry, Wake Forest University, Winston-Salem, NC, USA
| | - Wei He
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Haorui Zhang
- Department of Mathematics and Statistics, Wake Forest University, Winston-Salem, NC, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston Salem, NC, USA; Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA.
| |
Collapse
|
99
|
Su Y, Wang F, Zhang S, Liang Y, Wong KC, Li X. scWMC: weighted matrix completion-based imputation of scRNA-seq data via prior subspace information. Bioinformatics 2022; 38:4537-4545. [PMID: 35984287 DOI: 10.1093/bioinformatics/btac570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 08/09/2022] [Accepted: 08/18/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) can provide insight into gene expression patterns at the resolution of individual cells, which offers new opportunities to study the behavior of different cell types. However, it is often plagued by dropout events, a phenomenon where the expression value of a gene tends to be measured as zero in the expression matrix due to various technical defects. RESULTS In this article, we argue that borrowing gene and cell information across column and row subspaces directly results in suboptimal solutions due to the noise contamination in imputing dropout values. Thus, to impute more precisely the dropout events in scRNA-seq data, we develop a regularization for leveraging that imperfect prior information to estimate the true underlying prior subspace and then embed it in a typical low-rank matrix completion-based framework, named scWMC. To evaluate the performance of the proposed method, we conduct comprehensive experiments on simulated and real scRNA-seq data. Extensive data analysis, including simulated analysis, cell clustering, differential expression analysis, functional genomic analysis, cell trajectory inference and scalability analysis, demonstrate that our method produces improved imputation results compared to competing methods that benefits subsequent downstream analysis. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/XuYuanchi/scWMC and test data is available at https://doi.org/10.5281/zenodo.6832477. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yanchi Su
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, Hong Kong SAR
| | - Shixiong Zhang
- School of Computer Science and Technology, Xidian University, Xian 710000, China
| | - Yanchun Liang
- Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Science and Technology, Zhuhai 519041, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
100
|
Chen P, Zhong J, Yang K, Zhang X, Chen Y, Liu R. TPD: a web tool for tipping-point detection based on dynamic network biomarker. Brief Bioinform 2022; 23:6693599. [DOI: 10.1093/bib/bbac399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 08/04/2022] [Accepted: 08/16/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Tipping points or critical transitions widely exist during the progression of many biological processes. It is of great importance to detect the tipping point with the measured omics data, which may be a key to achieving predictive or preventive medicine. We present the tipping point detector (TPD), a web tool for the detection of the tipping point during the dynamic process of biological systems, and further its leading molecules or network, based on the input high-dimensional time series or stage course data. With the solid theoretical background of dynamic network biomarker (DNB) and a series of computational methods for DNB detection, TPD detects the potential tipping point/critical state from the input omics data and outputs multifarious visualized results, including a suggested tipping point with a statistically significant P value, the identified key genes and their functional biological information, the dynamic change in the DNB/leading network that may drive the critical transition and the survival analysis based on DNB scores that may help to identify ‘dark’ genes (nondifferential in terms of expression but differential in terms of DNB scores). TPD fits all current browsers, such as Chrome, Firefox, Edge, Opera, Safari and Internet Explorer. TPD is freely accessible at http://www.rpcomputationalbiology.cn/TPD.
Collapse
Affiliation(s)
- Pei Chen
- School of Mathematics, South China University of Technology , Guangzhou 510640, China
| | - Jiayuan Zhong
- School of Mathematics and Big Data, Foshan University , Foshan 528000, China
| | - Kun Yang
- School of Computer Science and Engineering, South China University of Technology , Guangzhou 510640, China
| | - Xuhang Zhang
- School of Computer Science and Engineering, South China University of Technology , Guangzhou 510640, China
| | - Yingqi Chen
- School of Computer Science and Engineering, South China University of Technology , Guangzhou 510640, China
| | - Rui Liu
- School of Mathematics, South China University of Technology , Guangzhou 510640, China
| |
Collapse
|