1
|
Qu R, Cheng X, Sefik E, Stanley Iii JS, Landa B, Strino F, Platt S, Garritano J, Odell ID, Coifman R, Flavell RA, Myung P, Kluger Y. Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol 2025; 43:258-268. [PMID: 38580861 PMCID: PMC11452571 DOI: 10.1038/s41587-024-02186-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 02/26/2024] [Indexed: 04/07/2024]
Abstract
Single-cell RNA sequencing has been widely used to investigate cell state transitions and gene dynamics of biological processes. Current strategies to infer the sequential dynamics of genes in a process typically rely on constructing cell pseudotime through cell trajectory inference. However, the presence of concurrent gene processes in the same group of cells and technical noise can obscure the true progression of the processes studied. To address this challenge, we present GeneTrajectory, an approach that identifies trajectories of genes rather than trajectories of cells. Specifically, optimal transport distances are calculated between gene distributions across the cell-cell graph to extract gene programs and define their gene pseudotemporal order. Here we demonstrate that GeneTrajectory accurately extracts progressive gene dynamics in myeloid lineage maturation. Moreover, we show that GeneTrajectory deconvolves key gene programs underlying mouse skin hair follicle dermal condensate differentiation that could not be resolved by cell trajectory approaches. GeneTrajectory facilitates the discovery of gene programs that control the changes and activities of biological processes.
Collapse
Affiliation(s)
- Rihao Qu
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, USA
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | - Xiuyuan Cheng
- Department of Mathematics, Duke University, Durham, NC, USA
| | - Esen Sefik
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | | | - Boris Landa
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
| | | | - Sarah Platt
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
- Department of Dermatology, Yale University School of Medicine, New Haven, CT, USA
| | - James Garritano
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
| | - Ian D Odell
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
- Department of Dermatology, Yale University School of Medicine, New Haven, CT, USA
| | - Ronald Coifman
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
- Department of Mathematics, Yale University, New Haven, CT, USA
- Department of Electrical Engineering, Yale University, New Haven, CT, USA
| | - Richard A Flavell
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
- Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT, USA
| | - Peggy Myung
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
- Department of Dermatology, Yale University School of Medicine, New Haven, CT, USA
| | - Yuval Kluger
- Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, USA.
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA.
- Program in Applied Mathematics, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Urasawa T, Kawasaki N. Proteomic Approach Using DIA-MS Identifies Morphogenesis-Associated Proteins during Cardiac Differentiation of Human iPS Cells. ACS OMEGA 2025; 10:344-357. [PMID: 39829588 PMCID: PMC11740111 DOI: 10.1021/acsomega.4c06371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 12/06/2024] [Accepted: 12/13/2024] [Indexed: 01/22/2025]
Abstract
Human-induced pluripotent stem cell (hiPSC)-derived cardiomyocytes have potential applications in regenerative medicine. The quality by design (QbD) approach enables the efficiency and quality assurance in the manufacturing of hiPSC-derived products. It requires a molecular understanding of hiPSC differentiation throughout the differentiation process; however, information on cardiac differentiation remains limited. Proteins associated with the early stages of cardiac differentiation would be useful in the cardiomyocyte quality assessment. Here, we performed quantitative proteomics of hiPSC intermediate cells in the early phase of cardiac differentiation to better understand their molecular characteristics. Proteomic profiles suggested that day 5-7 cells were in the morphogenetic stage of cardiac differentiation. Trophoblast glycoprotein (TPBG) was the most up-regulated protein in the morphogenetic stage; it was previously shown to be up-regulated during differentiation into neural stem cells. Proteomics of TPBG-knockdown cells revealed that TPBG is involved in cell proliferation and is related to the cardiomyocyte yield, suggesting that it could be used as a marker in QbD development. Our approach helps us understand the molecular basis of hiPSC differentiation and could be a powerful tool in QbD-based manufacturing.
Collapse
Affiliation(s)
- Takaya Urasawa
- Biopharmaceutical and Regenerative
Sciences, Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Nana Kawasaki
- Biopharmaceutical and Regenerative
Sciences, Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| |
Collapse
|
3
|
Wang Y, Li K, Zhang R, Fan Y, Huang L, Zhou F. GraCEImpute: A novel graph clustering autoencoder approach for imputation of single-cell RNA-seq data. Comput Biol Med 2025; 184:109400. [PMID: 39561511 DOI: 10.1016/j.compbiomed.2024.109400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 10/14/2024] [Accepted: 11/07/2024] [Indexed: 11/21/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology establishes a unique view for elucidating cellular heterogeneity in various biological systems. Yet the scRNA-seq data is compromised by a high dropout rate due to the technological limitation, and the substantial data loss poses computational challenges on subsequent analyses. This study introduces a novel graph clustering autoencoder (GCAE)-based imputation approach (GraCEImpute) to address the challenge of missing data in scRNA-seq data. Our comprehensive evaluation demonstrates that the GraCEImpute model outperforms existing approaches in accurately imputing dropout zeros within scRNA-seq data. The proposed GraCEImpute model also demonstrates the significantly enhanced quality of downstream scRNA-seq data analyses, including clustering, differential gene expression (DEG) analysis, and cell trajectory inference. These improvements underscore the GraCEImpute model's potential to facilitate a deeper understanding of cellular processes and heterogeneity through the scRNA-seq data analyses. The source code is released at https://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Yueying Wang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yusi Fan
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
| | - Lan Huang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Biology and Engineering, Guizhou Medical University, Guiyang, 550025, Guizhou, China.
| |
Collapse
|
4
|
Yang B, Li J, Li X, Liu S. Gene regulatory network inference based on novel ensemble method. Brief Funct Genomics 2024; 23:866-878. [PMID: 39324652 DOI: 10.1093/bfgp/elae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024] Open
Abstract
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Jing Li
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Xiang Li
- Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China
| | - Sanrong Liu
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| |
Collapse
|
5
|
Liu W, Teng Z, Li Z, Chen J. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci 2024; 16:990-1004. [PMID: 38778003 DOI: 10.1007/s12539-024-00633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 05/25/2024]
Abstract
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| | - Zhijie Teng
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China
| | - Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| |
Collapse
|
6
|
Zhang Q, Kang L, Yang H, Liu F, Wu X. Supervised analysis of alternative polyadenylation from single-cell and spatial transcriptomics data with spvAPA. Brief Bioinform 2024; 26:bbae720. [PMID: 39799000 PMCID: PMC11724721 DOI: 10.1093/bib/bbae720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 12/19/2024] [Accepted: 12/30/2024] [Indexed: 01/15/2025] Open
Abstract
Alternative polyadenylation (APA) is an important driver of transcriptome diversity that generates messenger RNA isoforms with distinct 3' ends. The rapid development of single-cell and spatial transcriptomic technologies opened up new opportunities for exploring APA data to discover hidden cell subpopulations invisible in conventional gene expression analysis. However, conventional gene-level analysis tools are not fully applicable to APA data, and commonly used unsupervised dimensionality reduction methods often disregard experimentally derived annotations such as cell type identities. Here, we proposed a supervised analytical framework termed spvAPA, specifically used for APA analysis from both single-cell and spatial transcriptomics data. First, an iterative imputation method based on weighted nearest neighbor was designed to recover missing APA signatures, by integrating both gene expression and APA modalities. Second, a supervised feature selection method based on sparse partial least squares discriminant analysis was devised to identify APA features distinguishing cell types or spatial morphologies. Additionally, spvAPA improves the visualization of high-dimensional data for discovering novel cell subtypes, which considers APA features and dual modalities of gene expression and APA. Evaluations across nine single-cell and spatial transcriptomics datasets demonstrate the effectiveness and applicability of spvAPA. spvAPA is available at https://github.com/BMILAB/spvAPA.
Collapse
Affiliation(s)
- Qinglong Zhang
- Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
| | - Liping Kang
- Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
| | - Haoran Yang
- Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
| | - Fei Liu
- Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
| | - Xiaohui Wu
- Cancer Institute, Suzhou Medical College, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
- Jiangsu Key Laboratory of Infection and Immunity, Soochow University, NO. 199 Ren-ai Road, SIP, Suzhou 215000, China
| |
Collapse
|
7
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
8
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
9
|
Luciani M, Garsia C, Beretta S, Cifola I, Peano C, Merelli I, Petiti L, Miccio A, Meneghini V, Gritti A. Human iPSC-derived neural stem cells displaying radial glia signature exhibit long-term safety in mice. Nat Commun 2024; 15:9433. [PMID: 39487141 PMCID: PMC11530573 DOI: 10.1038/s41467-024-53613-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 10/17/2024] [Indexed: 11/04/2024] Open
Abstract
Human induced pluripotent stem cell-derived neural stem/progenitor cells (hiPSC-NSCs) hold promise for treating neurodegenerative and demyelinating disorders. However, comprehensive studies on their identity and safety remain limited. In this study, we demonstrate that hiPSC-NSCs adopt a radial glia-associated signature, sharing key epigenetic and transcriptional characteristics with human fetal neural stem cells (hfNSCs) while exhibiting divergent profiles from glioblastoma stem cells. Long-term transplantation studies in mice showed robust and stable engraftment of hiPSC-NSCs, with predominant differentiation into glial cells and no evidence of tumor formation. Additionally, we identified the Sterol Regulatory Element Binding Transcription Factor 1 (SREBF1) as a regulator of astroglial differentiation in hiPSC-NSCs. These findings provide valuable transcriptional and epigenetic reference datasets to prospectively define the maturation stage of NSCs derived from different hiPSC sources and demonstrate the long-term safety of hiPSC-NSCs, reinforcing their potential as a viable alternative to hfNSCs for clinical applications.
Collapse
Affiliation(s)
- Marco Luciani
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy
| | - Chiara Garsia
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy
- Vita-Salute San Raffaele University, Milan, Italy
| | - Stefano Beretta
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy
- Vita-Salute San Raffaele University, Milan, Italy
| | - Ingrid Cifola
- Institute for Biomedical Technologies (ITB), National Research Council (CNR), via F.lli Cervi 93, 20054 Segrate, Milan, Italy
| | - Clelia Peano
- Institute of Genetics and Biomedical Research, UoS of Milan, National Research Council, Rozzano, Milan, Italy
- Human Technopole, Via Rita Levi Montalcini 1, Milan, Italy
| | - Ivan Merelli
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy
| | - Luca Petiti
- Institute for Biomedical Technologies (ITB), National Research Council (CNR), via F.lli Cervi 93, 20054 Segrate, Milan, Italy
| | - Annarita Miccio
- IMAGINE Institute, Université de Paris, Sorbonne Paris Cité, Paris, France
| | - Vasco Meneghini
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy.
- Vita-Salute San Raffaele University, Milan, Italy.
| | - Angela Gritti
- San Raffaele Telethon Institute for Gene Therapy, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) San Raffaele Scientific Institute, Milan, Italy.
- Vita-Salute San Raffaele University, Milan, Italy.
| |
Collapse
|
10
|
Lhomond G, Schubert M, Croce J. Spatiotemporal requirements of nuclear β-catenin define early sea urchin embryogenesis. PLoS Biol 2024; 22:e3002880. [PMID: 39531468 DOI: 10.1371/journal.pbio.3002880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 12/20/2024] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
Establishment of the 3 primordial germ layers (ectoderm, endoderm, and mesoderm) during early animal development represents an essential prerequisite for the emergence of properly patterned embryos. β-catenin is an ancient protein that is known to play essential roles in this process. However, these roles have chiefly been established through inhibition of β-catenin translation or function at the time of fertilization. Comprehensive analyses reporting the totality of functions played by nuclear β-catenin during early embryogenesis of a given animal, i.e., at different developmental stages and in different germ layers, are thus still lacking. In this study, we used an inducible, conditional knockdown system in the sea urchin to characterize all possible requirements of β-catenin for germ layer establishment and patterning. By blocking β-catenin protein production starting at 7 different time points of early development, between fertilization and 12 h post fertilization, we established a clear correlation between the position of a germ layer along the primary embryonic axis (the animal-vegetal axis) and its dependence on nuclear β-catenin activity. For example, in the vegetal hemisphere, we determined that the 3 germ layers (skeletogenic mesoderm, non-skeletogenic mesoderm, and endoderm) require distinct and highly specific durations of β-catenin production for their respective specification, with the most vegetal germ layer, the skeletogenic mesoderm, requiring the shortest duration. Likewise, for the 2 animal territories (ectoderm and anterior neuroectoderm), we established that their restriction, along the animal-vegetal axis, relies on different durations of β-catenin production and that the longest duration is required for the most animal territory, the anterior neuroectoderm. Moreover, we found that 2 of the vegetal germ layers, the non-skeletogenic mesoderm and the endoderm, further require a prolonged period of nuclear β-catenin activity after their specification to maintain their respective germ layer identities through time. Finally, we determined that restriction of the anterior neuroectoderm territory depends on at least 2 nuclear β-catenin-dependent inputs and a nuclear β-catenin-independent mechanism. Taken together, this work is the first to comprehensively define the spatiotemporal requirements of β-catenin during the early embryogenesis of a single animal, the sea urchin Paracentrotus lividus, thereby providing new experimental evidence for a better understanding of the roles played by this evolutionary conserved protein during animal development.
Collapse
Affiliation(s)
- Guy Lhomond
- Sorbonne Université, CNRS, Institut de la Mer de Villefranche (IMEV), Laboratoire de Biologie du Développement de Villefranche-sur-Mer (LBDV), Evolution of Intercellular Signaling in Development (EvoInSiDe), Villefranche-sur-Mer, France
| | - Michael Schubert
- Sorbonne Université, CNRS, Institut de la Mer de Villefranche (IMEV), Laboratoire de Biologie du Développement de Villefranche-sur-Mer (LBDV), Evolution of Intercellular Signaling in Development (EvoInSiDe), Villefranche-sur-Mer, France
| | - Jenifer Croce
- Sorbonne Université, CNRS, Institut de la Mer de Villefranche (IMEV), Laboratoire de Biologie du Développement de Villefranche-sur-Mer (LBDV), Evolution of Intercellular Signaling in Development (EvoInSiDe), Villefranche-sur-Mer, France
| |
Collapse
|
11
|
Varghese J, Link B, Wong B, Thundathil JC. Comparison of the developmental competence of in vitro-produced mouse embryos cultured under 5 versus 2% O 2 with in vivo-derived blastocysts. J Assist Reprod Genet 2024; 41:3089-3103. [PMID: 39313714 PMCID: PMC11621300 DOI: 10.1007/s10815-024-03267-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 09/16/2024] [Indexed: 09/25/2024] Open
Abstract
PURPOSE The prevalence of infertility in Canada has substantially increased over 30 years, and plateaued success rates of culture systems warrant further optimization for transfer outcomes. In clinical programs, embryos commonly undergo extended culture under 5% O2 until the blastocyst stage. The aim of this study is to characterize the developmental competence and stress-related responses of embryos cultured under 5 versus 2% O2 in comparison to in vivo-derived blastocysts. We hypothesized 2% O2 compromises developmental competence through altered embryonic stress responses and induction of apoptosis-related genes relative to those cultured under 5% O2 and in vivo-derived blastocysts. METHODS Quantitative measures of development and relative expressions of a cohort of stress-related genes in CD1 mouse zygotes cultured to blastocysts under 5 or 2% O2 were compared to in vivo-derived embryos. Apoptotic responses were evaluated using an immunofluorescence assay for Caspase-3. RESULTS The mean percentage of blastocysts developed, and total cell number of embryos derived in vivo or cultured under 5% O2 was significantly higher than those cultured under 2% O2. Blastocyst expansion was greatest in embryos cultured under 5% O2. Stress response genes were significantly upregulated in embryos cultured under 2% O2, and expression of antioxidant-related genes was significantly lower in cultured versus in vivo-derived embryos. Caspase-3 immunofluorescence was significantly higher in cultured embryos versus in vivo-derived embryos. CONCLUSION We inferred that 5% O2 systems better approximate physiologic oxygen availability for culture of mouse embryos, warranting re-evaluation of culturing embryos under threshold or sub-physiologic oxygen concentrations during clinical IVF programs.
Collapse
Affiliation(s)
- Jacob Varghese
- Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Brad Link
- Regional Fertility Program, 2000 Veterans Pl NW #400, Calgary, AB, T3B 4N2, Canada
| | - Ben Wong
- Regional Fertility Program, 2000 Veterans Pl NW #400, Calgary, AB, T3B 4N2, Canada
| | - Jacob C Thundathil
- Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| |
Collapse
|
12
|
Liang Z, Huang T, Li W, Ma Z, Wang K, Zhai Z, Fan Y, Fu Y, Wang X, Qin Y, Wang B, Zhao C, Kuang J, Pei D. ALKBH5 governs human endoderm fate by regulating the DKK1/4-mediated Wnt/β-catenin activation. Nucleic Acids Res 2024; 52:10879-10896. [PMID: 39166492 PMCID: PMC11472173 DOI: 10.1093/nar/gkae707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 06/25/2024] [Accepted: 08/03/2024] [Indexed: 08/23/2024] Open
Abstract
N6-methyladenonsine (m6A) is ubiquitously distributed in mammalian mRNA. However, the precise involvement of m6A in early development has yet to be fully elucidated. Here, we report that deletion of the m6A demethylase ALKBH5 in human embryonic stem cells (hESCs) severely impairs definitive endoderm (DE) differentiation. ALKBH5-/- hESCs fail to undergo the primitive streak (PS) intermediate transition that precedes endoderm specification. Mechanistically, we show that ALKBH5 deficiency induces m6A hypermethylation around the 3' untranslated region (3'UTR) of GATA6 transcripts and destabilizes GATA6 mRNA in a YTHDF2-dependent manner. Moreover, GATA6 binds to the promoters of critical regulatory genes involved in Wnt/β-catenin signaling transduction, including the canonical Wnt antagonist DKK1 and DKK4, which are unexpectedly repressed upon the dysregulation of GATA6 mRNA metabolism. Remarkably, DKK1 and DKK4 both exhibit a pleiotropic effect in modulating the Wnt/β-catenin cascade and guard the endogenous signaling activation underlying DE formation as potential downstream targets of the ALKBH5-GATA6 regulation. Here, we unravel a role of ALKBH5 in human endoderm formation in vitro by modulating the canonical Wnt signaling logic through the previously unrecognized functions of DKK1/4, thus capturing a more comprehensive role of m6A in early human embryogenesis.
Collapse
Affiliation(s)
- Zechuan Liang
- College of Life Sciences, Zhejiang University, Hangzhou, China
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
| | - Tao Huang
- College of Life Sciences, Zhejiang University, Hangzhou, China
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
| | - Wei Li
- CAS Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaoyi Ma
- College of Life Sciences, Zhejiang University, Hangzhou, China
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
| | - Kaipeng Wang
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Fudan Unversity, Shanghai, China
| | - Ziwei Zhai
- CAS Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yixin Fan
- CAS Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu Fu
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Fudan Unversity, Shanghai, China
| | - Xiaomin Wang
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
| | - Yue Qin
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Institute of Biology, Westlake Institute for Advanced Study, Hangzhou, China
| | - Bo Wang
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Zhejiang University of Science and Technology School of Information and Electronic Engineering, Hangzhou, China
- Zhejiang Key Laboratory of Biomedical Intelligent Computing Technology, Hangzhou, China
| | - Chengchen Zhao
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Zhejiang Key Laboratory of Biomedical Intelligent Computing Technology, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, 310024, Zhejiang, China
| | - Junqi Kuang
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Institute of Biology, Westlake Institute for Advanced Study, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, 310024, Zhejiang, China
| | - Duanqing Pei
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, 310024, Zhejiang, China
| |
Collapse
|
13
|
K Lodi M, Chernikov A, Ghosh P. COFFEE: consensus single cell-type specific inference for gene regulatory networks. Brief Bioinform 2024; 25:bbae457. [PMID: 39311699 PMCID: PMC11418232 DOI: 10.1093/bib/bbae457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/22/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, 1000 W Cary St, Richmond, VA 23284, United States
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, 1015 Floyd Ave, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, 401 W Main St, Richmond, VA 23284, United States
| |
Collapse
|
14
|
Zhang R, Wu M, Xiang D, Zhu J, Zhang Q, Zhong H, Peng Y, Wang Z, Ma G, Li G, Liu F, Ye W, Shi R, Zhou X, Babarinde IA, Su H, Chen J, Zhang X, Qin D, Hutchins AP, Pei D, Li D. A primate-specific endogenous retroviral envelope protein sequesters SFRP2 to regulate human cardiomyocyte development. Cell Stem Cell 2024; 31:1298-1314.e8. [PMID: 39146934 DOI: 10.1016/j.stem.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 06/04/2024] [Accepted: 07/23/2024] [Indexed: 08/17/2024]
Abstract
Endogenous retroviruses (ERVs) occupy a significant part of the human genome, with some encoding proteins that influence the immune system or regulate cell-cell fusion in early extra-embryonic development. However, whether ERV-derived proteins regulate somatic development is unknown. Here, we report a somatic developmental function for the primate-specific ERVH48-1 (SUPYN/Suppressyn). ERVH48-1 encodes a fragment of a viral envelope that is expressed during early embryonic development. Loss of ERVH48-1 led to impaired mesoderm and cardiomyocyte commitment and diverted cells to an ectoderm-like fate. Mechanistically, ERVH48-1 is localized to sub-cellular membrane compartments through a functional N-terminal signal peptide and binds to the WNT antagonist SFRP2 to promote its polyubiquitination and degradation, thus limiting SFRP2 secretion and blocking repression of WNT/β-catenin signaling. Knockdown of SFRP2 or expression of a chimeric SFRP2 with the ERVH48-1 signal peptide rescued cardiomyocyte differentiation. This study demonstrates how ERVH48-1 modulates WNT/β-catenin signaling and cell type commitment in somatic development.
Collapse
Affiliation(s)
- Ran Zhang
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China; State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
| | - Menghua Wu
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Dan Xiang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Hong Kong Institute of Science & Innovation, Guangzhou Institutes of Biomedicine and Health, Guangzhou, Guangdong 510530, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jieying Zhu
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Hong Kong Institute of Science & Innovation, Guangzhou Institutes of Biomedicine and Health, Guangzhou, Guangdong 510530, China
| | - Qi Zhang
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Hui Zhong
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Hong Kong Institute of Science & Innovation, Guangzhou Institutes of Biomedicine and Health, Guangzhou, Guangdong 510530, China
| | - Yuling Peng
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Zhenhua Wang
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Gang Ma
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Guihuan Li
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Fengping Liu
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China; Faculty of Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Weipeng Ye
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China
| | - Ruona Shi
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xuemeng Zhou
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Isaac A Babarinde
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Huanxing Su
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
| | - Jiekai Chen
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Hong Kong Institute of Science & Innovation, Guangzhou Institutes of Biomedicine and Health, Guangzhou, Guangdong 510530, China
| | - Xiaofei Zhang
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China; CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, GIBH-HKU Guangdong-Hong Kong Stem Cell and Regenerative Medicine Research Centre, Hong Kong Institute of Science & Innovation, Guangzhou Institutes of Biomedicine and Health, Guangzhou, Guangdong 510530, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Dajiang Qin
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China; Centre for Regenerative Medicine and Health, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong SAR, China.
| | - Andrew P Hutchins
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.
| | - Duanqing Pei
- Laboratory of Cell Fate Control, School of Life Sciences, Westlake University, Hangzhou 310024, China.
| | - Dongwei Li
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510799, China.
| |
Collapse
|
15
|
Chang LY, Hao TY, Wang WJ, Lin CY. Inference of single-cell network using mutual information for scRNA-seq data analysis. BMC Bioinformatics 2024; 25:292. [PMID: 39237886 PMCID: PMC11378379 DOI: 10.1186/s12859-024-05895-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 08/08/2024] [Indexed: 09/07/2024] Open
Abstract
BACKGROUND With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging. RESULTS We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property. CONCLUSIONS SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .
Collapse
Affiliation(s)
- Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Ting-Yi Hao
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Wei-Jie Wang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Center for Intelligent Drug Systems and Smart Bio-Devices, National Yang Ming Chiao Tung University, Hsinchu, 300, Taiwan.
- Cancer and Immunology Research Center, National Yang Ming Chiao Tung University, Taipei, 112, Taiwan.
- School of Dentistry, Kaohsiung Medical University, Kaohsiung, 807, Taiwan.
| |
Collapse
|
16
|
Sexton C, Victor Paul S, Barth D, Han M. Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures. NAR Genom Bioinform 2024; 6:lqae136. [PMID: 39363891 PMCID: PMC11447530 DOI: 10.1093/nargab/lqae136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 08/21/2024] [Accepted: 09/20/2024] [Indexed: 10/05/2024] Open
Abstract
We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.
Collapse
Affiliation(s)
- Corinne E Sexton
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Sylvia Victor Paul
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Dylan Barth
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, USA
| |
Collapse
|
17
|
Gao H, Shen W, Li R, Liu C, Wu S. Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1480-1491. [PMID: 38776196 DOI: 10.1109/tcbb.2024.3404013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Clustering of the single-cell RNA-seq (scRNA-seq) transcriptome profiles is able to identify cell types, which is beneficial to improve the understanding of disease progression. However, in practice, the single-cell expression data often contains a significant number of missing values as a result of technical variability. Missing data is a critical challenge in scRNA-seq clustering analysis since the unknown value does not reflect the underlying true expression level and makes it difficult to discovering cell types by applying clustering algorithms directly. Various approaches have been developed to overcome missing data issue in scRNA-seq clustering. Most of them recover missing expression values by borrowing observed data from similar cells or synthesizing data via generative adversarial networks. Such that the biologically meaningful cluster structure has not been sufficiently exploited. In this work, we introduce ColImpute, a collaborative structure-preserved missing data imputation approach for the scRNA-seq clustering. Specifically, a cluster structure-preserved imputation module and a subspace clustering module, which respectively perform missing data imputation and cell subtypes identification, are integrated into a unified optimization framework to train the two networks in a collaborative manner. Consequently, the clustering module effectively contributes cluster-structure information to guide the trainning process of the missing data imputation module. Simultaneously, the cluster structure-preserved imputation module reciprocally enhances the performance of the clustering module by generating more precise recovered samples. Promising experimental results show that the proposed method is effective for both the data imputation and the cell types identification.
Collapse
|
18
|
Intoh A, Watanabe-Susaki K, Kato T, Kiritani H, Kurisaki A. EPHA2 is a novel cell surface marker of OCT4-positive undifferentiated cells during the differentiation of mouse and human pluripotent stem cells. Stem Cells Transl Med 2024; 13:763-775. [PMID: 38811016 PMCID: PMC11328934 DOI: 10.1093/stcltm/szae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 04/14/2024] [Indexed: 05/31/2024] Open
Abstract
Embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) possess the intrinsic ability to differentiate into diverse cellular lineages, marking them as potent instruments in regenerative medicine. Nonetheless, the proclivity of these stem cells to generate teratomas post-transplantation presents a formidable obstacle to their therapeutic utility. In previous studies, we identified an array of cell surface proteins specifically expressed in the pluripotent state, as revealed through proteomic analysis. Here we focused on EPHA2, a protein found to be abundantly present on the surface of undifferentiated mouse ESCs and is diminished upon differentiation. Knock-down of Epha2 led to the spontaneous differentiation of mouse ESCs, underscoring a pivotal role of EPHA2 in maintaining an undifferentiated cell state. Further investigations revealed a strong correlation between EPHA2 and OCT4 expression during the differentiation of both mouse and human PSCs. Notably, removing EPHA2+ cells from mouse ESC-derived hepatic lineage reduced tumor formation after transplanting them into immune-deficient mice. Similarly, in human iPSCs, a larger proportion of EPHA2+ cells correlated with higher OCT4 expression, reflecting the pattern observed in mouse ESCs. Conclusively, EPHA2 emerges as a potential marker for selecting undifferentiated stem cells, providing a valuable method to decrease tumorigenesis risks after stem-cell transplantation in regenerative treatments.
Collapse
Affiliation(s)
- Atsushi Intoh
- Division of Biological Science, Nara Institute of Science and Technology, Nara, 630-0192, Japan
- Organ Development Research Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305-8560, Japan
| | - Kanako Watanabe-Susaki
- Organ Development Research Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305-8560, Japan
| | - Taku Kato
- Division of Biological Science, Nara Institute of Science and Technology, Nara, 630-0192, Japan
| | - Hibiki Kiritani
- Division of Biological Science, Nara Institute of Science and Technology, Nara, 630-0192, Japan
| | - Akira Kurisaki
- Division of Biological Science, Nara Institute of Science and Technology, Nara, 630-0192, Japan
- Organ Development Research Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305-8560, Japan
| |
Collapse
|
19
|
Hozumi Y, Wei GW. Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2024; 445:115842. [PMID: 38464901 PMCID: PMC10919214 DOI: 10.1016/j.cam.2024.115842] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t -distributed stochastic neighbor embedding (t -SNE).
Collapse
Affiliation(s)
- Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
20
|
Jiang H, Wang MN, Huang YA, Huang Y. Graph-Regularized Non-Negative Matrix Factorization for Single-Cell Clustering in scRNA-Seq Data. IEEE J Biomed Health Inform 2024; 28:4986-4994. [PMID: 38787664 DOI: 10.1109/jbhi.2024.3400050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) has brought forth fresh perspectives on intricate biological processes, revealing the nuances and divergences present among distinct cells. Accurate single-cell analysis is a crucial prerequisite for in-depth investigation into the underlying mechanisms of heterogeneity. Due to various technical noises, like the impact of dropout values, scRNA-seq data remains challenging to interpret. In this work, we propose an unsupervised learning framework for scRNA-seq data analysis (aka Sc-GNNMF). Based on the non-negativity and sparsity of scRNA-seq data, we propose employing graph-regularized non-negative matrix factorization (GNNMF) algorithm for the analysis of scRNA-seq data, which involves estimating cell-cell sparse similarity and gene-gene sparse similarity through Laplacian kernels and p-nearest neighbor graphs ( p-NNG). By assuming intrinsic geometric local invariance, we use a weighted p-nearest known neighbors ( p-NKN) to optimize the scRNA-seq data. The optimized scRNA-seq data then participates in the matrix decomposition process, promoting the closeness of cells with similar types in cell-gene data space and determining a more suitable embedding space for clustering. Sc-GNNMF demonstrates superior performance compared to other methods and maintains satisfactory compatibility and robustness, as evidenced by experiments on 11 real scRNA-seq datasets. Furthermore, Sc-GNNMF yields excellent results in clustering tasks, extracting useful gene markers, and pseudo-temporal analysis.
Collapse
|
21
|
Xie J, Ruan S, Tu M, Yuan Z, Hu J, Li H, Li S. Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding. Oncogene 2024; 43:2279-2292. [PMID: 38834657 DOI: 10.1038/s41388-024-03074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 05/22/2024] [Accepted: 05/28/2024] [Indexed: 06/06/2024]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) is a high-throughput technique used to study gene expression at the single-cell level. Clustering analysis is a commonly used method in scRNA-seq data analysis, helping researchers identify cell types and uncover interactions between cells. However, the choice of a robust similarity metric in the clustering procedure is still an open challenge due to the complex underlying structures of the data and the inherent noise in data acquisition. Here, we propose a deep clustering method for scRNA-seq data called scRISE (scRNA-seq Iterative Smoothing and self-supervised discriminative Embedding model) to resolve this challenge. The model consists of two main modules: an iterative smoothing module based on graph autoencoders designed to denoise the data and refine the pairwise similarity in turn to gradually incorporate cell structural features and enrich the data information; and a self-supervised discriminative embedding module with adaptive similarity threshold for partitioning samples into correct clusters. Our approach has shown improved quality of data representation and clustering on seventeen scRNA-seq datasets against a number of state-of-the-art deep learning clustering methods. Furthermore, utilizing the scRISE method in biological analysis against the HNSCC dataset has unveiled 62 informative genes, highlighting their potential roles as therapeutic targets and biomarkers.
Collapse
Affiliation(s)
- Jinxin Xie
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Shanshan Ruan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Mingyan Tu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Jianguo Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
- Lingang Laboratory, Shanghai, 200031, China.
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
22
|
Wang Y, Zhou F, Guan J. SFINN: inferring gene regulatory network from single-cell and spatial transcriptomic data with shared factor neighborhood and integrated neural network. Bioinformatics 2024; 40:btae433. [PMID: 38950180 PMCID: PMC11236097 DOI: 10.1093/bioinformatics/btae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/18/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024] Open
Abstract
MOTIVATION The rise of single-cell RNA sequencing (scRNA-seq) technology presents new opportunities for constructing detailed cell type-specific gene regulatory networks (GRNs) to study cell heterogeneity. However, challenges caused by noises, technical errors, and dropout phenomena in scRNA-seq data pose significant obstacles to GRN inference, making the design of accurate GRN inference algorithms still essential. The recent growth of both single-cell and spatial transcriptomic sequencing data enables the development of supervised deep learning methods to infer GRNs on these diverse single-cell datasets. RESULTS In this study, we introduce a novel deep learning framework based on shared factor neighborhood and integrated neural network (SFINN) for inferring potential interactions and causalities between transcription factors and target genes from single-cell and spatial transcriptomic data. SFINN utilizes shared factor neighborhood to construct cellular neighborhood network based on gene expression data and additionally integrates cellular network generated from spatial location information. Subsequently, the cell adjacency matrix and gene pair expression are fed into an integrated neural network framework consisting of a graph convolutional neural network and a fully-connected neural network to determine whether the genes interact. Performance evaluation in the tasks of gene interaction and causality prediction against the existing GRN reconstruction algorithms demonstrates the usability and competitiveness of SFINN across different kinds of data. SFINN can be applied to infer GRNs from conventional single-cell sequencing data and spatial transcriptomic data. AVAILABILITY AND IMPLEMENTATION SFINN can be accessed at GitHub: https://github.com/JGuan-lab/SFINN.
Collapse
Affiliation(s)
- Yongjie Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Fengfan Zhou
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
23
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
24
|
Ren J, Li P, Yan J. CPMI: comprehensive neighborhood-based perturbed mutual information for identifying critical states of complex biological processes. BMC Bioinformatics 2024; 25:215. [PMID: 38879513 PMCID: PMC11180411 DOI: 10.1186/s12859-024-05836-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 06/10/2024] [Indexed: 06/19/2024] Open
Abstract
BACKGROUND There exists a critical transition or tipping point during the complex biological process. Such critical transition is usually accompanied by the catastrophic consequences. Therefore, hunting for the tipping point or critical state is of significant importance to prevent or delay the occurrence of catastrophic consequences. However, predicting critical state based on the high-dimensional small sample data is a difficult problem, especially for single-cell expression data. RESULTS In this study, we propose the comprehensive neighbourhood-based perturbed mutual information (CPMI) method to detect the critical states of complex biological processes. The CPMI method takes into account the relationship between genes and neighbours, so as to reduce the noise and enhance the robustness. This method is applied to a simulated dataset and six real datasets, including an influenza dataset, two single-cell expression datasets and three bulk datasets. The method can not only successfully detect the tipping points, but also identify their dynamic network biomarkers (DNBs). In addition, the discovery of transcription factors (TFs) which can regulate DNB genes and nondifferential 'dark genes' validates the effectiveness of our method. The numerical simulation verifies that the CPMI method is robust under different noise strengths and is superior to the existing methods on identifying the critical states. CONCLUSIONS In conclusion, we propose a robust computational method, i.e., CPMI, which is applicable in both the bulk and single cell datasets. The CPMI method holds great potential in providing the early warning signals for complex biological processes and enabling early disease diagnosis.
Collapse
Affiliation(s)
- Jing Ren
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China
- Longmen Laboratory, Luoyang, 471003, Henan, China
| | - Peiluan Li
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China.
- Longmen Laboratory, Luoyang, 471003, Henan, China.
| | - Jinling Yan
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
25
|
Liu W, Pan Y, Teng Z, Xu J. scDMAE: A Generative Denoising Model Adopted Mask Strategy for scRNA-Seq Data Recovery. IEEE J Biomed Health Inform 2024; 28:3772-3780. [PMID: 38568766 DOI: 10.1109/jbhi.2024.3383921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized gene expression studies at the single-cell level. However, the presence of technical noise and data sparsity in scRNA-seq often undermines the accuracy of subsequent analyses. Existing methods for denoising and imputing scRNA-seq data often rely on stringent assumptions about data distribution, limiting the effectiveness of data recovery. In this study, we propose the scDMAE model for denoising and recovery of scRNA-seq data. First, the model fuses gene expression features and topological features to discern the primary expression patterns of genes in cells. Then, an autoencoder with a masking strategy is used to model dropout events and separate potential noise in the data. Finally, the model incorporates the original raw data to recover the true biological expression value. By conducting experiments on various types of scRNA-Seq datasets, scDMAE demonstrates superior performance compared to other comparative methods based on six distinct evaluation metrics in downstream analysis. The scDMAE method can accurately cluster similar cell populations, identify differential genes and infer cell trajectories.
Collapse
|
26
|
Wan R, Zhang Y, Peng Y, Tian F, Gao G, Tang F, Jia J, Ge H. Unveiling gene regulatory networks during cellular state transitions without linkage across time points. Sci Rep 2024; 14:12355. [PMID: 38811747 PMCID: PMC11137113 DOI: 10.1038/s41598-024-62850-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 05/22/2024] [Indexed: 05/31/2024] Open
Abstract
Time-stamped cross-sectional data, which lack linkage across time points, are commonly generated in single-cell transcriptional profiling. Many previous methods for inferring gene regulatory networks (GRNs) driving cell-state transitions relied on constructing single-cell temporal ordering. Introducing COSLIR (COvariance restricted Sparse LInear Regression), we presented a direct approach to reconstructing GRNs that govern cell-state transitions, utilizing only the first and second moments of samples between two consecutive time points. Simulations validated COSLIR's perfect accuracy in the oracle case and demonstrated its robust performance in real-world scenarios. When applied to single-cell RT-PCR and RNAseq datasets in developmental biology, COSLIR competed favorably with existing methods. Notably, its running time remained nearly independent of the number of cells. Therefore, COSLIR emerges as a promising addition to GRN reconstruction methods under cell-state transitions, bypassing the single-cell temporal ordering to enhance accuracy and efficiency in single-cell transcriptional profiling.
Collapse
Affiliation(s)
- Ruosi Wan
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Yuhao Zhang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Yongli Peng
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
| | - Feng Tian
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, Beijing, China
| | - Jinzhu Jia
- School of Public Health and Center for Statistical Science, Peking University, Beijing, China.
| | - Hao Ge
- Beijing International Center for Mathematical Research, Peking University, Beijing, China.
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China.
| |
Collapse
|
27
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
28
|
Pottmeier P, Nikolantonaki D, Lanner F, Peuckert C, Jazin E. Sex-biased gene expression during neural differentiation of human embryonic stem cells. Front Cell Dev Biol 2024; 12:1341373. [PMID: 38764741 PMCID: PMC11101176 DOI: 10.3389/fcell.2024.1341373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 04/16/2024] [Indexed: 05/21/2024] Open
Abstract
Sex differences in the developing human brain are primarily attributed to hormonal influence. Recently however, genetic differences and their impact on the developing nervous system have attracted increased attention. To understand genetically driven sexual dimorphisms in neurodevelopment, we investigated genome-wide gene expression in an in vitro differentiation model of male and female human embryonic stem cell lines (hESC), independent of the effects of human sex hormones. Four male and four female-derived hESC lines were differentiated into a population of mixed neurons over 37 days. Differential gene expression and gene set enrichment analyses were conducted on bulk RNA sequencing data. While similar differentiation tendencies in all cell lines demonstrated the robustness and reproducibility of our differentiation protocol, we found sex-biased gene expression already in undifferentiated ESCs at day 0, but most profoundly after 37 days of differentiation. Male and female cell lines exhibited sex-biased expression of genes involved in neurodevelopment, suggesting that sex influences the differentiation trajectory. Interestingly, the highest contribution to sex differences was found to arise from the male transcriptome, involving both Y chromosome and autosomal genes. We propose 13 sex-biased candidate genes (10 upregulated in male cell lines and 3 in female lines) that are likely to affect neuronal development. Additionally, we confirmed gene dosage compensation of X/Y homologs escaping X chromosome inactivation through their Y homologs and identified a significant overexpression of the Y-linked demethylase UTY and KDM5D in male hESC during neuron development, confirming previous results in neural stem cells. Our results suggest that genetic sex differences affect neuronal differentiation trajectories, which could ultimately contribute to sex biases during human brain development.
Collapse
Affiliation(s)
- Philipp Pottmeier
- Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Danai Nikolantonaki
- Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Fredrik Lanner
- Division of Obstetrics and Gynecology, Department of Clinical Science, Intervention and Technology, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
| | - Christiane Peuckert
- Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- The Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| | - Elena Jazin
- Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
29
|
Hozumi Y, Tanemura KA, Wei GW. Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection. J Chem Inf Model 2024; 64:2829-2838. [PMID: 37402705 PMCID: PMC11009150 DOI: 10.1021/acs.jcim.3c00674] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing the downstream analysis. We present Correlated Clustering and Projection (CCP), a new data-domain dimensionality reduction method, for the first time. CCP projects each cluster of similar genes into a supergene defined as the accumulated pairwise nonlinear gene-gene correlations among all cells. Using 14 benchmark data sets, we demonstrate that CCP has significant advantages over classical principal component analysis (PCA) for clustering and/or classification problems with intrinsically high dimensionality. In addition, we introduce the Residue-Similarity index (RSI) as a novel metric for clustering and classification and the R-S plot as a new visualization tool. We show that the RSI correlates with accuracy without requiring the knowledge of the true labels. The R-S plot provides a unique alternative to the uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) for data with a large number of cell types.
Collapse
Affiliation(s)
- Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kiyoto Aramis Tanemura
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
30
|
Xue X, Kim YS, Ponce-Arias AI, O'Laughlin R, Yan RZ, Kobayashi N, Tshuva RY, Tsai YH, Sun S, Zheng Y, Liu Y, Wong FCK, Surani A, Spence JR, Song H, Ming GL, Reiner O, Fu J. A patterned human neural tube model using microfluidic gradients. Nature 2024; 628:391-399. [PMID: 38408487 PMCID: PMC11006583 DOI: 10.1038/s41586-024-07204-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/16/2024] [Indexed: 02/28/2024]
Abstract
The human nervous system is a highly complex but organized organ. The foundation of its complexity and organization is laid down during regional patterning of the neural tube, the embryonic precursor to the human nervous system. Historically, studies of neural tube patterning have relied on animal models to uncover underlying principles. Recently, models of neurodevelopment based on human pluripotent stem cells, including neural organoids1-5 and bioengineered neural tube development models6-10, have emerged. However, such models fail to recapitulate neural patterning along both rostral-caudal and dorsal-ventral axes in a three-dimensional tubular geometry, a hallmark of neural tube development. Here we report a human pluripotent stem cell-based, microfluidic neural tube-like structure, the development of which recapitulates several crucial aspects of neural patterning in brain and spinal cord regions and along rostral-caudal and dorsal-ventral axes. This structure was utilized for studying neuronal lineage development, which revealed pre-patterning of axial identities of neural crest progenitors and functional roles of neuromesodermal progenitors and the caudal gene CDX2 in spinal cord and trunk neural crest development. We further developed dorsal-ventral patterned microfluidic forebrain-like structures with spatially segregated dorsal and ventral regions and layered apicobasal cellular organizations that mimic development of the human forebrain pallium and subpallium, respectively. Together, these microfluidics-based neurodevelopment models provide three-dimensional lumenal tissue architectures with in vivo-like spatiotemporal cell differentiation and organization, which will facilitate the study of human neurodevelopment and disease.
Collapse
Affiliation(s)
- Xufeng Xue
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yung Su Kim
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Alfredo-Isaac Ponce-Arias
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Richard O'Laughlin
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Robin Zhexuan Yan
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Norio Kobayashi
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Rami Yair Tshuva
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Yu-Hwai Tsai
- Department of Internal Medicine, Division of Gastroenterology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Shiyu Sun
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yi Zheng
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yue Liu
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Frederick C K Wong
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Jason R Spence
- Department of Internal Medicine, Division of Gastroenterology, University of Michigan Medical School, Ann Arbor, MI, USA
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Hongjun Song
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Regenerative Medicine, University of Pennsylvania, Philadelphia, PA, USA
- The Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Guo-Li Ming
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Regenerative Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Orly Reiner
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Jianping Fu
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
31
|
Ranek JS, Stallaert W, Milner JJ, Redick M, Wolff SC, Beltran AS, Stanley N, Purvis JE. DELVE: feature selection for preserving biological trajectories in single-cell data. Nat Commun 2024; 15:2765. [PMID: 38553455 PMCID: PMC10980758 DOI: 10.1038/s41467-024-46773-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 03/07/2024] [Indexed: 04/02/2024] Open
Abstract
Single-cell technologies can measure the expression of thousands of molecular features in individual cells undergoing dynamic biological processes. While examining cells along a computationally-ordered pseudotime trajectory can reveal how changes in gene or protein expression impact cell fate, identifying such dynamic features is challenging due to the inherent noise in single-cell data. Here, we present DELVE, an unsupervised feature selection method for identifying a representative subset of molecular features which robustly recapitulate cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effects of confounding sources of variation, and instead models cell states from dynamic gene or protein modules based on core regulatory complexes. Using simulations, single-cell RNA sequencing, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate how DELVE selects features that better define cell-types and cell-type transitions. DELVE is available as an open-source python package: https://github.com/jranek/delve .
Collapse
Affiliation(s)
- Jolene S Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wayne Stallaert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - J Justin Milner
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Margaret Redick
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Samuel C Wolff
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Adriana S Beltran
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Human Pluripotent Cell Core, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jeremy E Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
32
|
Huynh T, Cang Z. Topological and geometric analysis of cell states in single-cell transcriptomic data. Brief Bioinform 2024; 25:bbae176. [PMID: 38632952 PMCID: PMC11024518 DOI: 10.1093/bib/bbae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 01/29/2024] [Accepted: 03/24/2024] [Indexed: 04/19/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.
Collapse
Affiliation(s)
- Tram Huynh
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| | - Zixuan Cang
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| |
Collapse
|
33
|
Chea S, Kreger J, Lopez-Burks ME, MacLean AL, Lander AD, Calof AL. Gastrulation-stage gene expression in Nipbl+/- mouse embryos foreshadows the development of syndromic birth defects. SCIENCE ADVANCES 2024; 10:eadl4239. [PMID: 38507484 PMCID: PMC10954218 DOI: 10.1126/sciadv.adl4239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]
Abstract
In animal models, Nipbl deficiency phenocopies gene expression changes and birth defects seen in Cornelia de Lange syndrome, the most common cause of which is Nipbl haploinsufficiency. Previous studies in Nipbl+/- mice suggested that heart development is abnormal as soon as cardiogenic tissue is formed. To investigate this, we performed single-cell RNA sequencing on wild-type and Nipbl+/- mouse embryos at gastrulation and early cardiac crescent stages. Nipbl+/- embryos had fewer mesoderm cells than wild-type and altered proportions of mesodermal cell subpopulations. These findings were associated with underexpression of genes implicated in driving specific mesodermal lineages. In addition, Nanog was found to be overexpressed in all germ layers, and many gene expression changes observed in Nipbl+/- embryos could be attributed to Nanog overexpression. These findings establish a link between Nipbl deficiency, Nanog overexpression, and gene expression dysregulation/lineage misallocation, which ultimately manifest as birth defects in Nipbl+/- animals and Cornelia de Lange syndrome.
Collapse
Affiliation(s)
- Stephenson Chea
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
| | - Jesse Kreger
- Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Martha E. Lopez-Burks
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
| | - Adam L. MacLean
- Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Arthur D. Lander
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
| | - Anne L. Calof
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, CA 92697, USA
- Department of Anatomy and Neurobiology, School of Medicine, University of California Irvine, Irvine, CA 92697, USA
| |
Collapse
|
34
|
Ding J, Liu R, Wen H, Tang W, Li Z, Venegas J, Su R, Molho D, Jin W, Wang Y, Lu Q, Li L, Zuo W, Chang Y, Xie Y, Tang J. DANCE: a deep learning library and benchmark platform for single-cell analysis. Genome Biol 2024; 25:72. [PMID: 38504331 PMCID: PMC10949782 DOI: 10.1186/s13059-024-03211-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024] Open
Abstract
DANCE is the first standard, generic, and extensible benchmark platform for accessing and evaluating computational methods across the spectrum of benchmark datasets for numerous single-cell analysis tasks. Currently, DANCE supports 3 modules and 8 popular tasks with 32 state-of-art methods on 21 benchmark datasets. People can easily reproduce the results of supported algorithms across major benchmark datasets via minimal efforts, such as using only one command line. In addition, DANCE provides an ecosystem of deep learning architectures and tools for researchers to facilitate their own model development. DANCE is an open-source Python package that welcomes all kinds of contributions.
Collapse
Affiliation(s)
- Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Hongzhi Wen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Wenzhuo Tang
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Zhaoheng Li
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Julian Venegas
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Runze Su
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Dylan Molho
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Wei Jin
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Yixin Wang
- Department of Bioengineering, Stanford University, Palo Alto, USA
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Lingxiao Li
- Department of Computer Science, Boston University, Boston, USA
| | - Wangyang Zuo
- Department of Computer Science, Zhejiang University of Technology, Zhejiang, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuying Xie
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, USA.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| |
Collapse
|
35
|
Baptista A, MacArthur BD, Banerji CRS. Charting cellular differentiation trajectories with Ricci flow. Nat Commun 2024; 15:2258. [PMID: 38480714 PMCID: PMC10937996 DOI: 10.1038/s41467-024-45889-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/06/2024] [Indexed: 03/17/2024] Open
Abstract
Complex biological processes, such as cellular differentiation, require intricate rewiring of intra-cellular signalling networks. Previous characterisations revealed a raised network entropy underlies less differentiated and malignant cell states. A connection between entropy and Ricci curvature led to applications of discrete curvatures to biological networks. However, predicting dynamic biological network rewiring remains an open problem. Here we apply Ricci curvature and Ricci flow to biological network rewiring. By investigating the relationship between network entropy and Forman-Ricci curvature, theoretically and empirically on single-cell RNA-sequencing data, we demonstrate that the two measures do not always positively correlate, as previously suggested, and provide complementary rather than interchangeable information. We next employ Ricci flow to derive network rewiring trajectories from stem cells to differentiated cells, accurately predicting true intermediate time points in gene expression time courses. In summary, we present a differential geometry toolkit for understanding dynamic network rewiring during cellular differentiation and cancer.
Collapse
Affiliation(s)
- Anthony Baptista
- The Alan Turing Institute, The British Library, London, NW1 2DB, UK.
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK.
| | - Ben D MacArthur
- The Alan Turing Institute, The British Library, London, NW1 2DB, UK
- School of Mathematical Sciences, University of Southampton, Southampton, SO17 1BJ, UK
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
| | - Christopher R S Banerji
- The Alan Turing Institute, The British Library, London, NW1 2DB, UK.
- UCL Cancer Institute, University College London, London, WC1E 6DD, UK.
| |
Collapse
|
36
|
Ang CE, Olmos VH, Vodehnal K, Zhou B, Lee QY, Sinha R, Narayanaswamy A, Mall M, Chesnov K, Dominicus CS, Südhof T, Wernig M. Generation of human excitatory forebrain neurons by cooperative binding of proneural NGN2 and homeobox factor EMX1. Proc Natl Acad Sci U S A 2024; 121:e2308401121. [PMID: 38446849 PMCID: PMC10945857 DOI: 10.1073/pnas.2308401121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/24/2024] [Indexed: 03/08/2024] Open
Abstract
Generation of defined neuronal subtypes from human pluripotent stem cells remains a challenge. The proneural factor NGN2 has been shown to overcome experimental variability observed by morphogen-guided differentiation and directly converts pluripotent stem cells into neurons, but their cellular heterogeneity has not been investigated yet. Here, we found that NGN2 reproducibly produces three different kinds of excitatory neurons characterized by partial coactivation of other neurotransmitter programs. We explored two principle approaches to achieve more precise specification: prepatterning the chromatin landscape that NGN2 is exposed to and combining NGN2 with region-specific transcription factors. Unexpectedly, the chromatin context of regionalized neural progenitors only mildly altered genomic NGN2 binding and its transcriptional response and did not affect neurotransmitter specification. In contrast, coexpression of region-specific homeobox factors such as EMX1 resulted in drastic redistribution of NGN2 including recruitment to homeobox targets and resulted in glutamatergic neurons with silenced nonglutamatergic programs. These results provide the molecular basis for a blueprint for improved strategies for generating a plethora of defined neuronal subpopulations from pluripotent stem cells for therapeutic or disease-modeling purposes.
Collapse
Affiliation(s)
- Cheen Euong Ang
- Department of Bioengineering, Stanford University, Stanford, CA94305
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Victor Hipolito Olmos
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Kayla Vodehnal
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Bo Zhou
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
- HHMI, Stanford University, Stanford, CA94305
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA94305
| | - Qian Yi Lee
- Department of Bioengineering, Stanford University, Stanford, CA94305
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Rahul Sinha
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Aadit Narayanaswamy
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Moritz Mall
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Kirill Chesnov
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| | - Caia S. Dominicus
- Wellcome Sanger Institute, Hinxton, CambridgeshireCB10 1SA, United Kingdom
- OpenTargets, Hinxton, CambridgeshireCB10 1SA, United Kingdom
| | - Thomas Südhof
- HHMI, Stanford University, Stanford, CA94305
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA94305
| | - Marius Wernig
- Department of Pathology, Stanford University, Stanford, CA94305
- Institute of Stem Cell and Regenerative Medicine, Stanford University, Stanford, CA94305
| |
Collapse
|
37
|
Gong L, Cui X, Liu Y, Lin C, Gao Z. SinCWIm: An imputation method for single-cell RNA sequence dropouts using weighted alternating least squares. Comput Biol Med 2024; 171:108225. [PMID: 38442556 DOI: 10.1016/j.compbiomed.2024.108225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/28/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
BACKGROUND AND OBJECTIVES Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for exploring cellular heterogeneity, discovering novel or rare cell types, distinguishing between tissue-specific cellular composition, and understanding cell differentiation during development. However, due to technological limitations, dropout events in scRNA-seq can mistakenly convert some entries in the real data to zero. This is equivalent to introducing noise into the data of cell gene expression entries. The data is contaminated, which affects the performance of downstream analyses, including clustering, cell annotation, differential gene expression analysis, and so on. Therefore, it is a crucial work to accurately determine which zeros are due to dropout events and perform imputation operations on them. METHODS Considering the different confidence levels of different zeros in the gene expression matrix, this paper proposes a SinCWIm method for dropout events in scRNA-seq based on weighted alternating least squares (WALS). The method utilizes Pearson correlation coefficient and hierarchical clustering to quantify the confidence of zero entries. It is then combined with WALS for matrix decomposition. And the imputation result is made close to the actual number by outlier removal and data correction operations. RESULTS A total of eight single-cell sequencing datasets were used for comparative experiments to demonstrate the overall superiority of SinCWIm over state-of-the-art models. SinCWIm was applied to cluster the data to obtain an adjusted RAND index evaluation, and the Usoskin, Pollen and Bladder datasets scored 94.46%, 96.48% and 76.74%, respectively. In addition, significant improvements were made in the retention of differential expression genes and visualization. CONCLUSIONS SinCWIm provides a valuable imputation method for handling dropout events in single-cell sequencing data. In comparison to advanced methods, SinCWIm demonstrates excellent performance in clustering, visualization and other aspects. It is applicable to various single-cell sequencing datasets.
Collapse
Affiliation(s)
- Lejun Gong
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China.
| | - Xiong Cui
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Yang Liu
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Cai Lin
- Department of Burn, Wound Repair and Regenerative Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325000, China.
| | - Zhihong Gao
- Zhejiang Engineering Research Center of Intelligent Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
38
|
Feng H, Cottrell S, Hozumi Y, Wei GW. Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data. Comput Biol Med 2024; 171:108211. [PMID: 38422960 PMCID: PMC10965033 DOI: 10.1016/j.compbiomed.2024.108211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/02/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
39
|
Li J, Pan X, Yuan Y, Shen HB. TFvelo: gene regulation inspired RNA velocity estimation. Nat Commun 2024; 15:1387. [PMID: 38360714 PMCID: PMC11258302 DOI: 10.1038/s41467-024-45661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 01/30/2024] [Indexed: 02/17/2024] Open
Abstract
RNA velocity is closely related with cell fate and is an important indicator for the prediction of cell states with elegant physical explanation derived from single-cell RNA-seq data. Most existing RNA velocity models aim to extract dynamics from the phase delay between unspliced and spliced mRNA for each individual gene. However, unspliced/spliced mRNA abundance may not provide sufficient signal for dynamic modeling, leading to poor fit in phase portraits. Motivated by the idea that RNA velocity could be driven by the transcriptional regulation, we propose TFvelo, which expands RNA velocity concept to various single-cell datasets without relying on splicing information, by introducing gene regulatory information. Our experiments on synthetic data and multiple scRNA-Seq datasets show that TFvelo can accurately fit genes dynamics on phase portraits, and effectively infer cell pseudo-time and trajectory from RNA abundance data. TFvelo opens a robust and accurate avenue for modeling RNA velocity for single cell data.
Collapse
Affiliation(s)
- Jiachen Li
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
40
|
Lin Y, Wu TY, Chen X, Wan S, Chao B, Xin J, Yang JYH, Wong WH, Wang YXR. Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE. Genome Res 2024; 34:119-133. [PMID: 38190633 PMCID: PMC10903952 DOI: 10.1101/gr.277960.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 12/13/2023] [Indexed: 01/10/2024]
Abstract
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Tung-Yu Wu
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Xi Chen
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Sheng Wan
- Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| | - Brian Chao
- Department of Electrical Engineering, Stanford University, Stanford, California 94305-9505, USA
| | - Jingxue Xin
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA;
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305-5464, USA
- Bio-X Program, Stanford University, Stanford, California 94305, USA
| | - Y X Rachel Wang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia;
| |
Collapse
|
41
|
Koutrouli M, Nastou K, Piera Líndez P, Bouwmeester R, Rasmussen S, Martens L, Jensen LJ. FAVA: high-quality functional association networks inferred from scRNA-seq and proteomics data. Bioinformatics 2024; 40:btae010. [PMID: 38192003 PMCID: PMC10868155 DOI: 10.1093/bioinformatics/btae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 12/07/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Pau Piera Líndez
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
42
|
Okubo T, Rivron N, Kabata M, Masaki H, Kishimoto K, Semi K, Nakajima-Koyama M, Kunitomi H, Kaswandy B, Sato H, Nakauchi H, Woltjen K, Saitou M, Sasaki E, Yamamoto T, Takashima Y. Hypoblast from human pluripotent stem cells regulates epiblast development. Nature 2024; 626:357-366. [PMID: 38052228 PMCID: PMC10849967 DOI: 10.1038/s41586-023-06871-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 11/15/2023] [Indexed: 12/07/2023]
Abstract
Recently, several studies using cultures of human embryos together with single-cell RNA-seq analyses have revealed differences between humans and mice, necessitating the study of human embryos1-8. Despite the importance of human embryology, ethical and legal restrictions have limited post-implantation-stage studies. Thus, recent efforts have focused on developing in vitro self-organizing models using human stem cells9-17. Here, we report genetic and non-genetic approaches to generate authentic hypoblast cells (naive hPSC-derived hypoblast-like cells (nHyCs))-known to give rise to one of the two extraembryonic tissues essential for embryonic development-from naive human pluripotent stem cells (hPSCs). Our nHyCs spontaneously assemble with naive hPSCs to form a three-dimensional bilaminar structure (bilaminoids) with a pro-amniotic-like cavity. In the presence of additional naive hPSC-derived analogues of the second extraembryonic tissue, the trophectoderm, the efficiency of bilaminoid formation increases from 20% to 40%, and the epiblast within the bilaminoids continues to develop in response to trophectoderm-secreted IL-6. Furthermore, we show that bilaminoids robustly recapitulate the patterning of the anterior-posterior axis and the formation of cells reflecting the pregastrula stage, the emergence of which can be shaped by genetically manipulating the DKK1/OTX2 hypoblast-like domain. We have therefore successfully modelled and identified the mechanisms by which the two extraembryonic tissues efficiently guide the stage-specific growth and progression of the epiblast as it establishes the post-implantation landmarks of human embryogenesis.
Collapse
Affiliation(s)
- Takumi Okubo
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Nicolas Rivron
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Vienna, Austria
| | - Mio Kabata
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Hideki Masaki
- Institute of Medical Science, University of Tokyo, Tokyo, Japan
- Advanced Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | | | - Katsunori Semi
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - May Nakajima-Koyama
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Haruko Kunitomi
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Belinda Kaswandy
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Hideyuki Sato
- Institute of Medical Science, University of Tokyo, Tokyo, Japan
- Advanced Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Hiromitsu Nakauchi
- Institute of Medical Science, University of Tokyo, Tokyo, Japan
- Advanced Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Knut Woltjen
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Mitinori Saitou
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Erika Sasaki
- Central Institute for Experimental Animals, Kawasaki, Japan
| | - Takuya Yamamoto
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Medical-risk Avoidance Based on iPS Cells Team, RIKEN Center for Advanced Intelligence Project (AIP), Kyoto, Japan.
| | - Yasuhiro Takashima
- Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan.
| |
Collapse
|
43
|
Walls AW, Rosenthal AZ. Bacterial phenotypic heterogeneity through the lens of single-cell RNA sequencing. Transcription 2024; 15:48-62. [PMID: 38532542 DOI: 10.1080/21541264.2024.2334110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 03/19/2024] [Indexed: 03/28/2024] Open
Abstract
Bacterial transcription is not monolithic. Microbes exist in a wide variety of cell states that help them adapt to their environment, acquire and produce essential nutrients, and engage in both competition and cooperation with their neighbors. While we typically think of bacterial adaptation as a group behavior, where all cells respond in unison, there is often a mixture of phenotypic responses within a bacterial population, where distinct cell types arise. A primary phenomenon driving these distinct cell states is transcriptional heterogeneity. Given that bacterial mRNA transcripts are extremely short-lived compared to eukaryotes, their transcriptional state is closely associated with their physiology, and thus the transcriptome of a bacterial cell acts as a snapshot of the behavior of that bacterium. Therefore, the application of single-cell transcriptomics to microbial populations will provide novel insight into cellular differentiation and bacterial ecology. In this review, we provide an overview of transcriptional heterogeneity in microbial systems, discuss the findings already provided by single-cell approaches, and plot new avenues of inquiry in transcriptional regulation, cellular biology, and mechanisms of heterogeneity that are made possible when microbial communities are analyzed at single-cell resolution.
Collapse
Affiliation(s)
- Alex W Walls
- Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC, USA
| | - Adam Z Rosenthal
- Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
44
|
Zhu X, Meng S, Li G, Wang J, Peng X. AGImpute: imputation of scRNA-seq data based on a hybrid GAN with dropouts identification. Bioinformatics 2024; 40:btae068. [PMID: 38317025 PMCID: PMC10877090 DOI: 10.1093/bioinformatics/btae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 02/19/2024] [Accepted: 02/19/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Dropout events bring challenges in analyzing single-cell RNA sequencing data as they introduce noise and distort the true distributions of gene expression profiles. Recent studies focus on estimating dropout probability and imputing dropout events by leveraging information from similar cells or genes. However, the number of dropout events differs in different cells, due to the complex factors, such as different sequencing protocols, cell types, and batch effects. The dropout event differences are not fully considered in assessing the similarities between cells and genes, which compromises the reliability of downstream analysis. RESULTS This work proposes a hybrid Generative Adversarial Network with dropouts identification to impute single-cell RNA sequencing data, named AGImpute. First, the numbers of dropout events in different cells in scRNA-seq data are differentially estimated by using a dynamic threshold estimation strategy. Next, the identified dropout events are imputed by a hybrid deep learning model, combining Autoencoder with a Generative Adversarial Network. To validate the efficiency of the AGImpute, it is compared with seven state-of-the-art dropout imputation methods on two simulated datasets and seven real single-cell RNA sequencing datasets. The results show that AGImpute imputes the least number of dropout events than other methods. Moreover, AGImpute enhances the performance of downstream analysis, including clustering performance, identifying cell-specific marker genes, and inferring trajectory in the time-course dataset. AVAILABILITY AND IMPLEMENTATION The source code can be obtained from https://github.com/xszhu-lab/AGImpute.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer and Information Security, Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
| | - Shuang Meng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541006, China
| | - Gaoshi Li
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541006, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 400083, China
| | - Xiaoqing Peng
- School of Life Sciences, Center for Medical Genetics, Central South University, Changsha 400083, China
| |
Collapse
|
45
|
Seo J, Saha S, Brown ME. The past, present, and future promise of pluripotent stem cells. JOURNAL OF IMMUNOLOGY AND REGENERATIVE MEDICINE 2024; 22-23:100077. [PMID: 38706532 PMCID: PMC11065261 DOI: 10.1016/j.regen.2024.100077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Affiliation(s)
| | | | - Matthew E. Brown
- University of Wisconsin-Madison, School of Medicine and Public Health, Department of Surgery, Division of Transplantation, 600 Highland Avenue, Madison, WI, 53792, United States
| |
Collapse
|
46
|
da Silva JEH, de Carvalho PC, Camata JJ, de Oliveira IL, Bernardino HS. A Data-Distribution and Successive Spline Points based discretization approach for evolving gene regulatory networks from scRNA-Seq time-series data using Cartesian Genetic Programming. Biosystems 2024; 236:105126. [PMID: 38278505 DOI: 10.1016/j.biosystems.2024.105126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/18/2023] [Accepted: 01/19/2024] [Indexed: 01/28/2024]
Abstract
The inference of gene regulatory networks (GRNs) is a widely addressed problem in Systems Biology. GRNs can be modeled as Boolean networks, which is the simplest approach for this task. However, Boolean models need binarized data. Several approaches have been developed for the discretization of gene expression data (GED). Also, the advance of data extraction technologies, such as single-cell RNA-Sequencing (scRNA-Seq), provides a new vision of gene expression and brings new challenges for dealing with its specificities, such as a large occurrence of zero data. This work proposes a new discretization approach for dealing with scRNA-Seq time-series data, named Distribution and Successive Spline Points Discretization (DSSPD), which considers the data distribution and a proper preprocessing step. Here, Cartesian Genetic Programming (CGP) is used to infer GRNs using the results of DSSPD. The proposal is compared with CGP with the standard data handling and five state-of-the-art algorithms on curated models and experimental data. The results show that the proposal improves the results of CGP in all tested cases and outperforms the state-of-the-art algorithms in most cases.
Collapse
Affiliation(s)
| | | | - José J Camata
- Universidade Federal de Juiz de Fora, Juiz de Fora, MG, Brazil.
| | | | | |
Collapse
|
47
|
Lan W, Liu M, Chen J, Ye J, Zheng R, Zhu X, Peng W. JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering. Methods 2024; 222:1-9. [PMID: 38128706 DOI: 10.1016/j.ymeth.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/07/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China; Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China.
| | - Mingyang Liu
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jianwei Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jin Ye
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Information Security, Guilin University of Science and Technology, Guilin, China
| | - Wei Peng
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
48
|
Tian J, Lei J, Roeder K. From local to global gene co-expression estimation using single-cell RNA-seq data. Biometrics 2024; 80:ujae001. [PMID: 38465983 PMCID: PMC10926266 DOI: 10.1093/biomtc/ujae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 10/01/2023] [Accepted: 01/15/2024] [Indexed: 03/12/2024]
Abstract
In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.
Collapse
Affiliation(s)
- Jinjin Tian
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| | - Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| |
Collapse
|
49
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
50
|
Xu F, Hu H, Lin H, Lu J, Cheng F, Zhang J, Li X, Shuai J. scGIR: deciphering cellular heterogeneity via gene ranking in single-cell weighted gene correlation networks. Brief Bioinform 2024; 25:bbae091. [PMID: 38487851 PMCID: PMC10940817 DOI: 10.1093/bib/bbae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/08/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene correlation network to evaluate gene importance. The algorithm transforms single-cell sequencing data into a robust gene correlation network through statistical independence, with correlation edges weighted by gene expression levels. We then constructed a random walk model on the resulting weighted gene correlation network to rank the importance of genes. Our analysis of gene importance using PageRank algorithm across nine authentic scRNA-seq datasets indicates that scGIR can effectively surmount technical noise, enabling the identification of cell types and inference of developmental trajectories. We demonstrated that the edges of gene correlation, weighted by expression, play a critical role in enhancing the algorithm's performance. Our findings emphasize that scGIR outperforms in enhancing the clustering of cell subtypes, reverse identifying differentially expressed marker genes, and uncovering genes with potential differential importance. Overall, we proposed a promising method capable of extracting more information from single-cell RNA sequencing datasets, potentially shedding new lights on cellular processes and disease mechanisms.
Collapse
Affiliation(s)
- Fei Xu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou 350108, China
| | - Hai Lin
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jun Lu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- School of Medical Imageology, Wannan Medical College, Wuhu 241002, China
| | - Feng Cheng
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jiqian Zhang
- Department of Physics, Anhui Normal University, Wuhu 241002, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jianwei Shuai
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China
| |
Collapse
|