1
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2025; 43:247-257. [PMID: 38609714 PMCID: PMC11825371 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
2
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
3
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
4
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Göttgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. Genome Res 2024; 34:1089-1105. [PMID: 38951027 PMCID: PMC11368181 DOI: 10.1101/gr.277950.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/24/2024] [Indexed: 07/03/2024]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
Affiliation(s)
- Guanjue Xiang
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA
| | - Xi He
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kathryn J Isaac
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Camden Jansen
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Alexander Q Wixom
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - April Cockburn
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Amber Miller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Qian Qi
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yanghua He
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Human Nutrition, Food and Animal Sciences, University of Hawaìi at Mānoa, Honolulu, Hawaii 96822, USA
| | - Yichao Li
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Elisabeth F Heuston
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Stacie M Anderson
- Flow Cytometry Core, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Jing Luan
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Marit W Vermunt
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Evanston, Illinois 60611, USA
| | - Michael E G Sauria
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Berthold Göttgens
- Wellcome and MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, United Kingdom
| | - Jim R Hughes
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Douglas R Higgs
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Mitchell J Weiss
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yong Cheng
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Gerd A Blobel
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - David M Bodine
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Qunhua Li
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
5
|
Waqas A, Tripathi A, Ramachandran RP, Stewart PA, Rasool G. Multimodal data integration for oncology in the era of deep neural networks: a review. Front Artif Intell 2024; 7:1408843. [PMID: 39118787 PMCID: PMC11308435 DOI: 10.3389/frai.2024.1408843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open
Abstract
Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.
Collapse
Affiliation(s)
- Asim Waqas
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, United States
| | - Aakash Tripathi
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| | - Ravi P. Ramachandran
- Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ, United States
| | - Paul A. Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States
| | - Ghulam Rasool
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| |
Collapse
|
6
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
7
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
8
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Gottgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.02.535219. [PMID: 37066352 PMCID: PMC10103973 DOI: 10.1101/2023.04.02.535219] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
|
9
|
Skok Gibbs C, Mahmood O, Bonneau R, Cho K. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization. Genome Biol 2024; 25:88. [PMID: 38589899 PMCID: PMC11003171 DOI: 10.1186/s13059-024-03226-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.
Collapse
Affiliation(s)
| | - Omar Mahmood
- Center for Data Science, New York University, New York, NY, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, 10011, USA
- Prescient Design, Genentech, New York, NY, 10010, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, 10011, USA.
- Prescient Design, Genentech, New York, NY, 10010, USA.
| |
Collapse
|
10
|
Yang C, Jin Y, Yin Y. Integration of single-cell transcriptome and chromatin accessibility and its application on tumor investigation. LIFE MEDICINE 2024; 3:lnae015. [PMID: 39872661 PMCID: PMC11749461 DOI: 10.1093/lifemedi/lnae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/25/2024] [Indexed: 01/30/2025]
Abstract
The advent of single-cell sequencing techniques has not only revolutionized the investigation of biological processes but also significantly contributed to unraveling cellular heterogeneity at unprecedented levels. Among the various methods, single-cell transcriptome sequencing stands out as the best established, and has been employed in exploring many physiological and pathological activities. The recently developed single-cell epigenetic sequencing techniques, especially chromatin accessibility sequencing, have further deepened our understanding of gene regulatory networks. In this review, we summarize the recent breakthroughs in single-cell transcriptome and chromatin accessibility sequencing methodologies. Additionally, we describe current bioinformatic strategies to integrate data obtained through these single-cell sequencing methods and highlight the application of this analysis strategy on a deeper understanding of tumorigenesis and tumor progression. Finally, we also discuss the challenges and anticipated developments in this field.
Collapse
Affiliation(s)
- Chunyuan Yang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences Peking University, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Yan Jin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences Peking University, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Yuxin Yin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences Peking University, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing 100191, China
- Institute of Precision Medicine, Peking University Shenzhen Hospital, Shenzhen 518036, China
| |
Collapse
|
11
|
Monnier L, Cournède PH. A novel batch-effect correction method for scRNA-seq data based on Adversarial Information Factorization. PLoS Comput Biol 2024; 20:e1011880. [PMID: 38386700 PMCID: PMC10914288 DOI: 10.1371/journal.pcbi.1011880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 03/05/2024] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology produces an unprecedented resolution at the level of a unique cell, raising great hopes in medicine. Nevertheless, scRNA-seq data suffer from high variations due to the experimental conditions, called batch effects, preventing any aggregated downstream analysis. Adversarial Information Factorization provides a robust batch-effect correction method that does not rely on prior knowledge of the cell types nor a specific normalization strategy while being adapted to any downstream analysis task. It compares to and even outperforms state-of-the-art methods in several scenarios: low signal-to-noise ratio, batch-specific cell types with few cells, and a multi-batches dataset with imbalanced batches and batch-specific cell types. Moreover, it best preserves the relative gene expression between cell types, yielding superior differential expression analysis results. Finally, in a more complex setting of a Leukemia cohort, our method preserved most of the underlying biological information for each patient while aligning the batches, improving the clustering metrics in the aggregated dataset.
Collapse
Affiliation(s)
- Lily Monnier
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| | - Paul-Henry Cournède
- Paris-Saclay University, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), Gif-sur-Yvette, France
| |
Collapse
|
12
|
Karin J, Bornfeld Y, Nitzan M. scPrisma infers, filters and enhances topological signals in single-cell data using spectral template matching. Nat Biotechnol 2023; 41:1645-1654. [PMID: 36849830 PMCID: PMC10635821 DOI: 10.1038/s41587-023-01663-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 01/06/2023] [Indexed: 03/01/2023]
Abstract
Single-cell RNA sequencing has been instrumental in uncovering cellular spatiotemporal context. This task is challenging as cells simultaneously encode multiple, potentially cross-interfering, biological signals. Here we propose scPrisma, a spectral computational method that uses topological priors to decouple, enhance and filter different classes of biological processes in single-cell data, such as periodic and linear signals. We apply scPrisma to the analysis of the cell cycle in HeLa cells, circadian rhythm and spatial zonation in liver lobules, diurnal cycle in Chlamydomonas and circadian rhythm in the suprachiasmatic nucleus in the brain. scPrisma can be used to distinguish mixed cellular populations by specific characteristics such as cell type and uncover regulatory networks and cell-cell interactions specific to predefined biological signals, such as the circadian rhythm. We show scPrisma's flexibility in incorporating prior knowledge, inference of topologically informative genes and generalization to additional diverse templates and systems. scPrisma can be used as a stand-alone workflow for signal analysis and as a prior step for downstream single-cell analysis.
Collapse
Affiliation(s)
- Jonathan Karin
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yonathan Bornfeld
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel.
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
13
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 93] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
14
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
15
|
Yuan Q, Duren Z. Continuous lifelong learning for modeling of gene regulation from single cell multiome data by leveraging atlas-scale external data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.01.551575. [PMID: 37577525 PMCID: PMC10418251 DOI: 10.1101/2023.08.01.551575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Accurate context-specific Gene Regulatory Networks (GRNs) inference from genomics data is a crucial task in computational biology. However, existing methods face limitations, such as reliance on gene expression data alone, lower resolution from bulk data, and data scarcity for specific cellular systems. Despite recent technological advancements, including single-cell sequencing and the integration of ATAC-seq and RNA-seq data, learning such complex mechanisms from limited independent data points still presents a daunting challenge, impeding GRN inference accuracy. To overcome this challenge, we present LINGER (LIfelong neural Network for GEne Regulation), a novel deep learning-based method to infer GRNs from single-cell multiome data with paired gene expression and chromatin accessibility data from the same cell. LINGER incorporates both 1) atlas-scale external bulk data across diverse cellular contexts and 2) the knowledge of transcription factor (TF) motif matching to cis-regulatory elements as a manifold regularization to address the challenge of limited data and extensive parameter space in GRN inference. Our results demonstrate that LINGER achieves 2-3 fold higher accuracy over existing methods. LINGER reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Additionally, following the GRN inference from a reference sc-multiome data, LINGER allows for the estimation of TF activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies. Overall, LINGER provides a comprehensive tool for robust gene regulation inference from genomics data, empowering deeper insights into cellular mechanisms.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC 29646, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC 29646, USA
| |
Collapse
|
16
|
Littman R, Cheng M, Wang N, Peng C, Yang X. SCING: Inference of robust, interpretable gene regulatory networks from single cell and spatial transcriptomics. iScience 2023; 26:107124. [PMID: 37434694 PMCID: PMC10331489 DOI: 10.1016/j.isci.2023.107124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 03/31/2023] [Accepted: 06/09/2023] [Indexed: 07/13/2023] Open
Abstract
Gene regulatory network (GRN) inference is an integral part of understanding physiology and disease. Single cell/nuclei RNA-seq (scRNA-seq/snRNA-seq) data has been used to elucidate cell-type GRNs; however, the accuracy and speed of current scRNAseq-based GRN approaches are suboptimal. Here, we present Single Cell INtegrative Gene regulatory network inference (SCING), a gradient boosting and mutual information-based approach for identifying robust GRNs from scRNA-seq, snRNA-seq, and spatial transcriptomics data. Performance evaluation using Perturb-seq datasets, held-out data, and the mouse cell atlas combined with the DisGeNET database demonstrates the improved accuracy and biological interpretability of SCING compared to existing methods. We applied SCING to the entire mouse single cell atlas, human Alzheimer's disease (AD), and mouse AD spatial transcriptomics. SCING GRNs reveal unique disease subnetwork modeling capabilities, have intrinsic capacity to correct for batch effects, retrieve disease relevant genes and pathways, and are informative on spatial specificity of disease pathogenesis.
Collapse
Affiliation(s)
- Russell Littman
- Department of Integrative Biology & Physiology, UCLA, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Michael Cheng
- Department of Integrative Biology & Physiology, UCLA, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ning Wang
- Department of Integrative Biology & Physiology, UCLA, Los Angeles, CA, USA
| | - Chao Peng
- Department of Neurology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Xia Yang
- Department of Integrative Biology & Physiology, UCLA, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Institute for Quantitative and Computational Biosciences (QCBio), Los Angeles, CA, USA
- Molecular Biology Institute (MBI), Los Angeles, CA, USA
- Brain Research Institute (BRI), Los Angeles, CA, USA
| |
Collapse
|
17
|
Burdziak C, Zhao CJ, Haviv D, Alonso-Curbelo D, Lowe SW, Pe’er D. scKINETICS: inference of regulatory velocity with single-cell transcriptomics data. Bioinformatics 2023; 39:i394-i403. [PMID: 37387147 PMCID: PMC10311321 DOI: 10.1093/bioinformatics/btad267] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Transcriptional dynamics are governed by the action of regulatory proteins and are fundamental to systems ranging from normal development to disease. RNA velocity methods for tracking phenotypic dynamics ignore information on the regulatory drivers of gene expression variability through time. RESULTS We introduce scKINETICS (Key regulatory Interaction NETwork for Inferring Cell Speed), a dynamical model of gene expression change which is fit with the simultaneous learning of per-cell transcriptional velocities and a governing gene regulatory network. Fitting is accomplished through an expectation-maximization approach designed to learn the impact of each regulator on its target genes, leveraging biologically motivated priors from epigenetic data, gene-gene coexpression, and constraints on cells' future states imposed by the phenotypic manifold. Applying this approach to an acute pancreatitis dataset recapitulates a well-studied axis of acinar-to-ductal transdifferentiation whilst proposing novel regulators of this process, including factors with previously appreciated roles in driving pancreatic tumorigenesis. In benchmarking experiments, we show that scKINETICS successfully extends and improves existing velocity approaches to generate interpretable, mechanistic models of gene regulatory dynamics. AVAILABILITY AND IMPLEMENTATION All python code and an accompanying Jupyter notebook with demonstrations are available at http://github.com/dpeerlab/scKINETICS.
Collapse
Affiliation(s)
- Cassandra Burdziak
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Chujun Julia Zhao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Department of Biomedical Engineering, Columbia University, 1210 Amsterdam Ave, New York, NY 10027, United States
| | - Doron Haviv
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Direna Alonso-Curbelo
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac, 10, Barcelona 08028, Spain
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815, United States
| | - Dana Pe’er
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815, United States
| |
Collapse
|
18
|
Zhang S, Pyne S, Pietrzak S, Halberg S, McCalla SG, Siahpirani AF, Sridharan R, Roy S. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 2023; 14:3064. [PMID: 37244909 PMCID: PMC10224950 DOI: 10.1038/s41467-023-38637-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 05/10/2023] [Indexed: 05/29/2023] Open
Abstract
Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. Single-cell technologies such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific GRNs are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the GRN for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated and real datasets, we show that scMTNI is a broadly applicable framework for linear and branching lineages that accurately infers GRN dynamics and identifies key regulators of fate transitions for diverse processes such as cellular reprogramming and differentiation.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stefan Pietrzak
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer Halberg
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
19
|
Güldener U, Kessler T, von Scheidt M, Hawe JS, Gerhard B, Maier D, Lachmann M, Laugwitz KL, Cassese S, Schömig AW, Kastrati A, Schunkert H. Machine Learning Identifies New Predictors on Restenosis Risk after Coronary Artery Stenting in 10,004 Patients with Surveillance Angiography. J Clin Med 2023; 12:2941. [PMID: 37109283 PMCID: PMC10142067 DOI: 10.3390/jcm12082941] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/31/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023] Open
Abstract
OBJECTIVE Machine learning (ML) approaches have the potential to uncover regular patterns in multi-layered data. Here we applied self-organizing maps (SOMs) to detect such patterns with the aim to better predict in-stent restenosis (ISR) at surveillance angiography 6 to 8 months after percutaneous coronary intervention with stenting. METHODS In prospectively collected data from 10,004 patients receiving percutaneous coronary intervention (PCI) for 15,004 lesions, we applied SOMs to predict ISR angiographically 6-8 months after index procedure. SOM findings were compared with results of conventional uni- and multivariate analyses. The predictive value of both approaches was assessed after random splitting of patients into training and test sets (50:50). RESULTS Conventional multivariate analyses revealed 10, mostly known, predictors for restenosis after coronary stenting: balloon-to-vessel ratio, complex lesion morphology, diabetes mellitus, left main stenting, stent type (bare metal vs. first vs. second generation drug eluting stent), stent length, stenosis severity, vessel size reduction, and prior bypass surgery. The SOM approach identified all these and nine further predictors, including chronic vessel occlusion, lesion length, and prior PCI. Moreover, the SOM-based model performed well in predicting ISR (AUC under ROC: 0.728); however, there was no meaningful advantage in predicting ISR at surveillance angiography in comparison with the conventional multivariable model (0.726, p = 0.3). CONCLUSIONS The agnostic SOM-based approach identified-without clinical knowledge-even more contributors to restenosis risk. In fact, SOMs applied to a large prospectively sampled cohort identified several novel predictors of restenosis after PCI. However, as compared with established covariates, ML technologies did not improve identification of patients at high risk for restenosis after PCI in a clinically relevant fashion.
Collapse
Affiliation(s)
- Ulrich Güldener
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
| | - Thorsten Kessler
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Munich Heart Alliance, 80802 Munich, Germany
| | - Moritz von Scheidt
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Munich Heart Alliance, 80802 Munich, Germany
| | - Johann S. Hawe
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
| | | | - Dieter Maier
- Biomax, Robert-Koch-Str. 2, 82152 Planegg, Germany
| | - Mark Lachmann
- Department of Cardiology, Klinikum Rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Karl-Ludwig Laugwitz
- DZHK (German Center for Cardiovascular Research), Partner Site Munich Heart Alliance, 80802 Munich, Germany
- Department of Cardiology, Klinikum Rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Salvatore Cassese
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
| | - Albert W. Schömig
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
| | - Adnan Kastrati
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Munich Heart Alliance, 80802 Munich, Germany
| | - Heribert Schunkert
- Department of Cardiology, Deutsches Herzzentrum München, Technische Universität München, 80636 Munich, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Munich Heart Alliance, 80802 Munich, Germany
| |
Collapse
|
20
|
Gromova T, Gehred ND, Vondriska TM. Single-cell transcriptomes in the heart: when every epigenome counts. Cardiovasc Res 2023; 119:64-78. [PMID: 35325060 PMCID: PMC10233279 DOI: 10.1093/cvr/cvac040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/03/2022] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
The response of an organ to stimuli emerges from the actions of individual cells. Recent cardiac single-cell RNA-sequencing studies of development, injury, and reprogramming have uncovered heterogeneous populations even among previously well-defined cell types, raising questions about what level of experimental resolution corresponds to disease-relevant, tissue-level phenotypes. In this review, we explore the biological meaning behind this cellular heterogeneity by undertaking an exhaustive analysis of single-cell transcriptomics in the heart (including a comprehensive, annotated compendium of studies published to date) and evaluating new models for the cardiac function that have emerged from these studies (including discussion and schematics that depict new hypotheses in the field). We evaluate the evidence to support the biological actions of newly identified cell populations and debate questions related to the role of cell-to-cell variability in development and disease. Finally, we present emerging epigenomic approaches that, when combined with single-cell RNA-sequencing, can resolve basic mechanisms of gene regulation and variability in cell phenotype.
Collapse
Affiliation(s)
- Tatiana Gromova
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Physiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Natalie D Gehred
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Physiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Thomas M Vondriska
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Physiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| |
Collapse
|
21
|
Li A, Xiong S, Li J, Mallik S, Liu Y, Fei R, Zhou H, Liu G. AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1574-1580. [PMID: 35853049 DOI: 10.1109/tcbb.2022.3192306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
When clustering gene expression, it is expected that correlation coefficients of genes in the same clusters are high, and that gene ontology (GO) enrichment analysis of most clusters will be significant. However, existing short-term gene expression clustering algorithms have limitations. To address this problem, we proposed a novel clustering process based on angular features for short-term gene expression. Our method (named AngClust) uses angular features to indicate the change of trend in gene expression levels at two neighboring time points. The changes of angles at multiple time points reflects the change of trend of the overall expression levels. Such changes are used to measure whether the expression trends of different genes are similar. To obtain functionally significant clusters from the clustering results, we evaluated numbers of genes in clusters, average correlation coefficient, fluctuation, and their correlation with GO term enrichment. The efficacy of AngClust outperform two other measures, Euclidean distance (ED) and dynamic time warping of correlation (DTW), on a dataset of yeast gene expression. The ratios of GO and pathway term-enriched of clusters of AngClust is higher than or equal to that of STEM and TMixClust on human, mouse, and yeast time series of gene expression.
Collapse
|
22
|
van der Sande M, Frölich S, van Heeringen SJ. Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
Affiliation(s)
| | | | - Simon J. van Heeringen
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
23
|
Preissl S, Gaulton KJ, Ren B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat Rev Genet 2023; 24:21-43. [PMID: 35840754 PMCID: PMC9771884 DOI: 10.1038/s41576-022-00509-1] [Citation(s) in RCA: 92] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2022] [Indexed: 12/24/2022]
Abstract
Cell type-specific gene expression patterns and dynamics during development or in disease are controlled by cis-regulatory elements (CREs), such as promoters and enhancers. Distinct classes of CREs can be characterized by their epigenomic features, including DNA methylation, chromatin accessibility, combinations of histone modifications and conformation of local chromatin. Tremendous progress has been made in cataloguing CREs in the human genome using bulk transcriptomic and epigenomic methods. However, single-cell epigenomic and multi-omic technologies have the potential to provide deeper insight into cell type-specific gene regulatory programmes as well as into how they change during development, in response to environmental cues and through disease pathogenesis. Here, we highlight recent advances in single-cell epigenomic methods and analytical tools and discuss their readiness for human tissue profiling.
Collapse
Affiliation(s)
- Sebastian Preissl
- Center for Epigenomics, University of California San Diego, La Jolla, CA, USA.
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Kyle J Gaulton
- Department of Paediatrics, Paediatric Diabetes Research Center, University of California San Diego, La Jolla, CA, USA.
| | - Bing Ren
- Center for Epigenomics, University of California San Diego, La Jolla, CA, USA.
- Department of Cellular and Molecular Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
| |
Collapse
|
24
|
Jiang J, Lyu P, Li J, Huang S, Tao J, Blackshaw S, Qian J, Wang J. IReNA: Integrated regulatory network analysis of single-cell transcriptomes and chromatin accessibility profiles. iScience 2022; 25:105359. [PMID: 36325073 PMCID: PMC9619378 DOI: 10.1016/j.isci.2022.105359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 09/19/2022] [Accepted: 10/12/2022] [Indexed: 11/16/2022] Open
Abstract
Recently, single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) have been developed to separately measure transcriptomes and chromatin accessibility profiles at the single-cell resolution. However, few methods can reliably integrate these data to perform regulatory network analysis. Here, we developed integrated regulatory network analysis (IReNA) for network inference through the integrated analysis of scRNA-seq and scATAC-seq data, network modularization, transcription factor enrichment, and construction of simplified intermodular regulatory networks. Using public datasets, we showed that integrated network analysis of scRNA-seq data with scATAC-seq data is more precise to identify known regulators than scRNA-seq data analysis alone. Moreover, IReNA outperformed currently available methods in identifying known regulators. IReNA facilitates the systems-level understanding of biological regulatory mechanisms and is available at https://github.com/jiang-junyao/IReNA.
Collapse
Affiliation(s)
- Junyao Jiang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jinlian Li
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Sunan Huang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Jiawang Tao
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Seth Blackshaw
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jie Wang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- China-New Zealand Joint Laboratory on Biomedicine and Health, Guangzhou 510530, China
- Corresponding author
| |
Collapse
|
25
|
ElKarami B, Alkhateeb A, Qattous H, Alshomali L, Shahrrava B. Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network. Cancer Inform 2022; 21:11769351221124205. [PMID: 36187912 PMCID: PMC9523837 DOI: 10.1177/11769351221124205] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 08/14/2022] [Indexed: 11/15/2022] Open
Abstract
Introduction Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others. Methods In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients. Results The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map. Conclusion The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.
Collapse
Affiliation(s)
- Bashier ElKarami
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
| | - Abedalrhman Alkhateeb
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
- Abedalrhman Alkhateeb, Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, P. O. Box 1438, Al-Jubaiha, Amman 11941, Jordan.
| | - Hazem Qattous
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
| | - Lujain Alshomali
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
| | - Behnam Shahrrava
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
| |
Collapse
|
26
|
Song Q, Zhu X, Jin L, Chen M, Zhang W, Su J. SMGR: a joint statistical method for integrative analysis of single-cell multi-omics data. NAR Genom Bioinform 2022; 4:lqac056. [PMID: 35910046 PMCID: PMC9326599 DOI: 10.1093/nargab/lqac056] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 06/16/2022] [Accepted: 07/20/2022] [Indexed: 12/12/2022] Open
Abstract
Unravelling the regulatory programs from single-cell multi-omics data has long been one of the major challenges in genomics, especially in the current emerging single-cell field. Currently there is a huge gap between fast-growing single-cell multi-omics data and effective methods for the integrative analysis of these inherent sparse and heterogeneous data. In this study, we have developed a novel method, Single-cell Multi-omics Gene co-Regulatory algorithm (SMGR), to detect coherent functional regulatory signals and target genes from the joint single-cell RNA-sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data obtained from different samples. Given that scRNA-seq and scATAC-seq data can be captured by zero-inflated Negative Binomial distribution, we utilize a generalized linear regression model to identify the latent representation of consistently expressed genes and peaks, thus enables the identification of co-regulatory programs and the elucidation of regulating mechanisms. Results from both simulation and experimental data demonstrate that SMGR outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of SMGR, we apply SMGR to mixed-phenotype acute leukemia (MPAL) and identify the MPAL-specific regulatory program with significant peak-gene links, which greatly enhance our understanding of the regulatory mechanisms and potential targets of this complex tumor.
Collapse
Affiliation(s)
- Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Atrium Health Wake Forest Baptist, Winston-Salem, NC27157, USA
| | - Xuewei Zhu
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC27101, USA
| | - Lingtao Jin
- Department of Molecular Medicine, UT Health San Antonio, San Antonio, TX78229, USA
| | - Minghan Chen
- Wake Forest University, Department of Computer Science, Winston-Salem, NC27109, USA
| | - Wei Zhang
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Atrium Health Wake Forest Baptist, Winston-Salem, NC27157, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
27
|
Zhao X, Lan Y, Chen D. Exploring long non-coding RNA networks from single cell omics data. Comput Struct Biotechnol J 2022; 20:4381-4389. [PMID: 36051880 PMCID: PMC9403499 DOI: 10.1016/j.csbj.2022.08.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
|
28
|
Yuan Q, Duren Z. Integration of single-cell multi-omics data by regression analysis on unpaired observations. Genome Biol 2022; 23:160. [PMID: 35854350 PMCID: PMC9295346 DOI: 10.1186/s13059-022-02726-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/06/2022] [Indexed: 12/04/2022] Open
Abstract
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA
| | - Zhana Duren
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA.
| |
Collapse
|
29
|
Duren Z, Chang F, Naqing F, Xin J, Liu Q, Wong WH. Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG. Genome Biol 2022; 23:114. [PMID: 35578363 PMCID: PMC9109353 DOI: 10.1186/s13059-022-02682-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 04/29/2022] [Indexed: 12/12/2022] Open
Abstract
Technological development has enabled the profiling of gene expression and chromatin accessibility from the same cell. We develop scREG, a dimension reduction methodology, based on the concept of cis-regulatory potential, for single cell multiome data. This concept is further used for the construction of subpopulation-specific cis-regulatory networks. The capability of inferring useful regulatory network is demonstrated by the two-fold increment on network inference accuracy compared to the Pearson correlation-based method and the 27-fold enrichment of GWAS variants for inflammatory bowel disease in the cis-regulatory elements. The R package scREG provides comprehensive functions for single cell multiome data analysis.
Collapse
Affiliation(s)
- Zhana Duren
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA.
| | - Fengge Chang
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA
| | - Fnu Naqing
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, 29646, USA
| | - Jingxue Xin
- Department of Statistics, Department of Biomedical Data Science and Bio-X Program, Stanford University, Stanford, CA, 94305, USA
| | - Qiao Liu
- Department of Statistics, Department of Biomedical Data Science and Bio-X Program, Stanford University, Stanford, CA, 94305, USA
| | - Wing Hung Wong
- Department of Statistics, Department of Biomedical Data Science and Bio-X Program, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
30
|
Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00469-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
31
|
Qiu S, Li B, Xia Y, Xuan Z, Li Z, Xie L, Gu C, Lv J, Lu C, Jiang T, Fang L, Xu P, Yang J, Li Y, Chen Z, Zhang L, Wang L, Zhang D, Xu H, Wang W, Xu Z. CircTHBS1 drives gastric cancer progression by increasing INHBA mRNA expression and stability in a ceRNA- and RBP-dependent manner. Cell Death Dis 2022; 13:266. [PMID: 35338119 PMCID: PMC8949653 DOI: 10.1038/s41419-022-04720-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/16/2022] [Accepted: 03/09/2022] [Indexed: 12/15/2022]
Abstract
Circular RNAs (circRNAs) play vital regulatory roles in the progression of multiple cancers. In our study, transcriptome analysis and self-organizing maps (SOM) were applied to screen backbone circRNAs in gastric cancer (GC). Upon validation of the expression patterns of screened circRNAs, gain- and loss-of-function assays were performed in vitro and in vivo. Underlying mechanisms were investigated using RNA pull-down, luciferase reporter assay and RNA immunoprecipitation. The expression of circTHBS1 was significantly increased in GC and associated with poor prognosis. CircTHBS1 facilitated the malignant behavior and epithelial-to-mesenchymal transition of GC cells. Mechanistically, circTHBS1 sponged miR-204-5p to promote the expression of Inhibin Subunit Beta A (INHBA). Moreover, circTHBS1 could enhance the HuR-mediated mRNA stability of INHBA, which subsequently activated the TGF-β pathway. Our research identified circTHBS1 as an oncogenic circRNA that enhances GC malignancy by elevating INHBA expression, providing new insight and a feasible target for the diagnosis and treatment of GC.
Collapse
Affiliation(s)
- Shengkui Qiu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China.,Department of General Surgery, The Second Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu Province, China
| | - Bowen Li
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Yiwen Xia
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Zhe Xuan
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Zheng Li
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Li Xie
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Chao Gu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Jialun Lv
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Chen Lu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Tianlu Jiang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Lang Fang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Penghui Xu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Jing Yang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Ying Li
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Zetian Chen
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Lu Zhang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Linjun Wang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Diancai Zhang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Hao Xu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China
| | - Weizhi Wang
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China.
| | - Zekuan Xu
- Department of General Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, Jiangsu Province, China. .,Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, 211166, Jiangsu Province, China.
| |
Collapse
|
32
|
Hegenbarth JC, Lezzoche G, De Windt LJ, Stoll M. Perspectives on Bulk-Tissue RNA Sequencing and Single-Cell RNA Sequencing for Cardiac Transcriptomics. FRONTIERS IN MOLECULAR MEDICINE 2022; 2:839338. [PMID: 39086967 PMCID: PMC11285642 DOI: 10.3389/fmmed.2022.839338] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 01/31/2022] [Indexed: 08/02/2024]
Abstract
The heart has been the center of numerous transcriptomic studies in the past decade. Even though our knowledge of the key organ in our cardiovascular system has significantly increased over the last years, it is still not fully understood yet. In recent years, extensive efforts were made to understand the genetic and transcriptomic contribution to cardiac function and failure in more detail. The advent of Next Generation Sequencing (NGS) technologies has brought many discoveries but it is unable to comprehend the finely orchestrated interactions between and within the various cell types of the heart. With the emergence of single-cell sequencing more than 10 years ago, researchers gained a valuable new tool to enable the exploration of new subpopulations of cells, cell-cell interactions, and integration of multi-omic approaches at a single-cell resolution. Despite this innovation, it is essential to make an informed choice regarding the appropriate technique for transcriptomic studies, especially when working with myocardial tissue. Here, we provide a primer for researchers interested in transcriptomics using NGS technologies.
Collapse
Affiliation(s)
- Jana-Charlotte Hegenbarth
- Department of Molecular Genetics, Faculty of Science and Engineering, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Giuliana Lezzoche
- Department of Molecular Genetics, Faculty of Science and Engineering, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Leon J. De Windt
- Department of Molecular Genetics, Faculty of Science and Engineering, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Monika Stoll
- Department of Biochemistry, CARIM School for Cardiovascular Diseases, Maastricht University, Maastricht, Netherlands
- Department of Genetic Epidemiology, Institute of Human Genetics, University Hospital Münster, Münster, Germany
| |
Collapse
|
33
|
Jansen C, Paraiso KD, Zhou JJ, Blitz IL, Fish MB, Charney RM, Cho JS, Yasuoka Y, Sudou N, Bright AR, Wlizla M, Veenstra GJC, Taira M, Zorn AM, Mortazavi A, Cho KWY. Uncovering the mesendoderm gene regulatory network through multi-omic data integration. Cell Rep 2022; 38:110364. [PMID: 35172134 PMCID: PMC8917868 DOI: 10.1016/j.celrep.2022.110364] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 10/30/2021] [Accepted: 01/19/2022] [Indexed: 01/01/2023] Open
Abstract
Mesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data composed of more than two data types is challenging. Here, we use linked self-organizing maps to combine chromatin immunoprecipitation sequencing (ChIP-seq)/ATAC-seq with temporal, spatial, and perturbation RNA sequencing (RNA-seq) data from Xenopus tropicalis mesendoderm development to build a high-resolution genome scale mechanistic GRN. We recover both known and previously unsuspected TF-DNA/TF-TF interactions validated through reporter assays. Our analysis provides insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly dimensional multi-omic datasets.
Collapse
Affiliation(s)
- Camden Jansen
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Kitt D Paraiso
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Jeff J Zhou
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Ira L Blitz
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Margaret B Fish
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Rebekah M Charney
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Jin Sun Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Yuuri Yasuoka
- Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Norihiro Sudou
- Department of Anatomy, School of Medicine, Toho University, Tokyo, Japan
| | - Ann Rose Bright
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Marcin Wlizla
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Gert Jan C Veenstra
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Masanori Taira
- Department of Biological Sciences, Chuo University, Tokyo, Japan
| | - Aaron M Zorn
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| | - Ken W Y Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| |
Collapse
|
34
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
35
|
Ma X, Somasundaram A, Qi Z, Hartman D, Singh H, Osmanbeyoglu H. SPaRTAN, a computational framework for linking cell-surface receptors to transcriptional regulators. Nucleic Acids Res 2021; 49:9633-9647. [PMID: 34500467 PMCID: PMC8464045 DOI: 10.1093/nar/gkab745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 08/09/2021] [Accepted: 09/06/2021] [Indexed: 12/22/2022] Open
Abstract
The identity and functions of specialized cell types are dependent on the complex interplay between signaling and transcriptional networks. Recently single-cell technologies have been developed that enable simultaneous quantitative analysis of cell-surface receptor expression with transcriptional states. To date, these datasets have not been used to systematically develop cell-context-specific maps of the interface between signaling and transcriptional regulators orchestrating cellular identity and function. We present SPaRTAN (Single-cell Proteomic and RNA based Transcription factor Activity Network), a computational method to link cell-surface receptors to transcription factors (TFs) by exploiting cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) datasets with cis-regulatory information. SPaRTAN is applied to immune cell types in the blood to predict the coupling of signaling receptors with cell context-specific TFs. Selected predictions are validated by prior knowledge and flow cytometry analyses. SPaRTAN is then used to predict the signaling coupled TF states of tumor infiltrating CD8+ T cells in malignant peritoneal and pleural mesotheliomas. SPaRTAN enhances the utility of CITE-seq datasets to uncover TF and cell-surface receptor relationships in diverse cellular states.
Collapse
Affiliation(s)
- Xiaojun Ma
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA
| | - Ashwin Somasundaram
- Department of Medicine, Division of Hematology/Oncology, University of Pittsburgh, Pittsburgh, PA 15213, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA
| | - Zengbiao Qi
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA
| | - Douglas J Hartman
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA
| | - Harinder Singh
- Center for Systems Immunology and Department of Immunology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15261, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15213, USA
| |
Collapse
|
36
|
Rautenstrauch P, Vlot AHC, Saran S, Ohler U. Intricacies of single-cell multi-omics data integration. Trends Genet 2021; 38:128-139. [PMID: 34561102 DOI: 10.1016/j.tig.2021.08.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 08/20/2021] [Accepted: 08/23/2021] [Indexed: 02/06/2023]
Abstract
A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches.
Collapse
Affiliation(s)
- Pia Rautenstrauch
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany
| | - Anna Hendrika Cornelia Vlot
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany
| | - Sepideh Saran
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany
| | - Uwe Ohler
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany; Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany; Department of Biology, Humboldt Universität zu Berlin, 10117 Berlin, Germany.
| |
Collapse
|
37
|
Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data. Nat Commun 2021; 12:4763. [PMID: 34362918 PMCID: PMC8346476 DOI: 10.1038/s41467-021-25089-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 07/20/2021] [Indexed: 12/22/2022] Open
Abstract
The comparison of gene regulatory networks between diseased versus healthy individuals or between two different treatments is an important scientific problem. Here, we propose sc-compReg as a method for the comparative analysis of gene expression regulatory networks between two conditions using single cell gene expression (scRNA-seq) and single cell chromatin accessibility data (scATAC-seq). Our software, sc-compReg, can be used as a stand-alone package that provides joint clustering and embedding of the cells from both scRNA-seq and scATAC-seq, and the construction of differential regulatory networks across two conditions. We apply the method to compare the gene regulatory networks of an individual with chronic lymphocytic leukemia (CLL) versus a healthy control. The analysis reveals a tumor-specific B cell subpopulation in the CLL patient and identifies TOX2 as a potential regulator of this subpopulation. Changes in cell state underlie the difference between health and disease. Here, the authors propose a computational framework for the integration of gene expression and chromatin-accessibility data from single cells to identify differences in gene regulation in cell types across two conditions.
Collapse
|
38
|
Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat Biotechnol 2021; 39:765-774. [PMID: 33462509 PMCID: PMC8195812 DOI: 10.1038/s41587-020-00801-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 12/14/2020] [Indexed: 01/29/2023]
Abstract
Nonlinear data visualization methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), summarize the complex transcriptomic landscape of single cells in two dimensions or three dimensions, but they neglect the local density of data points in the original space, often resulting in misleading visualizations where densely populated subsets of cells are given more visual space than warranted by their transcriptional diversity in the dataset. Here we present den-SNE and densMAP, which are density-preserving visualization tools based on t-SNE and UMAP, respectively, and demonstrate their ability to accurately incorporate information about transcriptomic variability into the visual interpretation of single-cell RNA sequencing data. Applied to recently published datasets, our methods reveal significant changes in transcriptomic variability in a range of biological processes, including heterogeneity in transcriptomic variability of immune cells in blood and tumor, human immune cell specialization and the developmental trajectory of Caenorhabditis elegans. Our methods are readily applicable to visualizing high-dimensional data in other scientific domains.
Collapse
|
39
|
Marín-Sedeño E, de Morentin XM, Pérez-Pomares JM, Gómez-Cabrero D, Ruiz-Villalba A. Understanding the Adult Mammalian Heart at Single-Cell RNA-Seq Resolution. Front Cell Dev Biol 2021; 9:645276. [PMID: 34055776 PMCID: PMC8149764 DOI: 10.3389/fcell.2021.645276] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/09/2021] [Indexed: 12/24/2022] Open
Abstract
During the last decade, extensive efforts have been made to comprehend cardiac cell genetic and functional diversity. Such knowledge allows for the definition of the cardiac cellular interactome as a reasonable strategy to increase our understanding of the normal and pathologic heart. Previous experimental approaches including cell lineage tracing, flow cytometry, and bulk RNA-Seq have often tackled the analysis of cardiac cell diversity as based on the assumption that cell types can be identified by the expression of a single gene. More recently, however, the emergence of single-cell RNA-Seq technology has led us to explore the diversity of individual cells, enabling the cardiovascular research community to redefine cardiac cell subpopulations and identify relevant ones, and even novel cell types, through their cell-specific transcriptomic signatures in an unbiased manner. These findings are changing our understanding of cell composition and in consequence the identification of potential therapeutic targets for different cardiac diseases. In this review, we provide an overview of the continuously changing cardiac cellular landscape, traveling from the pre-single-cell RNA-Seq times to the single cell-RNA-Seq revolution, and discuss the utilities and limitations of this technology.
Collapse
Affiliation(s)
- Ernesto Marín-Sedeño
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| | - Xabier Martínez de Morentin
- Traslational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra, Instituto de Investigación Sanitaria de Navarra (IdiSNA), Universidad Pública de Navarra, Pamplona, Spain
| | - Jose M. Pérez-Pomares
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| | - David Gómez-Cabrero
- Traslational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra, Instituto de Investigación Sanitaria de Navarra (IdiSNA), Universidad Pública de Navarra, Pamplona, Spain
- Centre of Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, United Kingdom
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Adrián Ruiz-Villalba
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| |
Collapse
|
40
|
Li Y, Ma L, Wu D, Chen G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief Bioinform 2021; 22:6189773. [PMID: 33778867 DOI: 10.1093/bib/bbab024] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Multi-omics allows the systematic understanding of the information flow across different omics layers, while single omics can mainly reflect one aspect of the biological system. The advancement of bulk and single-cell sequencing technologies and related computational methods for multi-omics largely facilitated the development of system biology and precision medicine. Single-cell approaches have the advantage of dissecting cellular dynamics and heterogeneity, whereas traditional bulk technologies are limited to individual/population-level investigation. In this review, we first summarize the technologies for producing bulk and single-cell multi-omics data. Then, we survey the computational approaches for integrative analysis of bulk and single-cell multimodal data, respectively. Moreover, the databases and data storage for multi-omics, as well as the tools for visualizing multimodal data are summarized. We also outline the integration between bulk and single-cell data, and discuss the applications of multi-omics in precision medicine. Finally, we present the challenges and perspectives for multi-omics development.
Collapse
Affiliation(s)
| | - Lu Ma
- China Normal University, China
| | | | | |
Collapse
|
41
|
Scherer M, Schmidt F, Lazareva O, Walter J, Baumbach J, Schulz MH, List M. Machine learning for deciphering cell heterogeneity and gene regulation. NATURE COMPUTATIONAL SCIENCE 2021; 1:183-191. [PMID: 38183187 DOI: 10.1038/s43588-021-00038-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
Epigenetics studies inheritable and reversible modifications of DNA that allow cells to control gene expression throughout their development and in response to environmental conditions. In computational epigenomics, machine learning is applied to study various epigenetic mechanisms genome wide. Its aim is to expand our understanding of cell differentiation, that is their specialization, in health and disease. Thus far, most efforts focus on understanding the functional encoding of the genome and on unraveling cell-type heterogeneity. Here, we provide an overview of state-of-the-art computational methods and their underlying statistical concepts, which range from matrix factorization and regularized linear regression to deep learning methods. We further show how the rise of single-cell technology leads to new computational challenges and creates opportunities to further our understanding of epigenetic regulation.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | | | - Olga Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Jörn Walter
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computational BioMedicine Lab, Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, University Hospital and Goethe University Frankfurt, Frankfurt, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| |
Collapse
|
42
|
Sinha S, Satpathy AT, Zhou W, Ji H, Stratton JA, Jaffer A, Bahlis N, Morrissy S, Biernaskie JA. Profiling Chromatin Accessibility at Single-cell Resolution. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:172-190. [PMID: 33581341 PMCID: PMC8602754 DOI: 10.1016/j.gpb.2020.06.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 03/04/2020] [Accepted: 08/15/2020] [Indexed: 01/22/2023]
Abstract
How distinct transcriptional programs are enacted to generate cellular heterogeneity and plasticity, and enable complex fate decisions are important open questions. One key regulator is the cell’s epigenome state that drives distinct transcriptional programs by regulating chromatin accessibility. Genome-wide chromatin accessibility measurements can impart insights into regulatory sequences (in)accessible to DNA-binding proteins at a single-cell resolution. This review outlines molecular methods and bioinformatic tools for capturing cell-to-cell chromatin variation using single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) in a scalable fashion. It also covers joint profiling of chromatin with transcriptome/proteome measurements, computational strategies to integrate multi-omic measurements, and predictive bioinformatic tools to infer chromatin accessibility from single-cell transcriptomic datasets. Methodological refinements that increase power for cell discovery through robust chromatin coverage and integrate measurements from multiple modalities will further expand our understanding of gene regulation during homeostasis and disease.
Collapse
Affiliation(s)
- Sarthak Sinha
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada.
| | - Ansuman T Satpathy
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Weiqiang Zhou
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Hongkai Ji
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Jo A Stratton
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Arzina Jaffer
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Nizar Bahlis
- Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, AB T2N 4Z6, Canada
| | - Sorana Morrissy
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, AB T2N 4Z6, Canada; Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Jeff A Biernaskie
- Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.
| |
Collapse
|
43
|
Forcato M, Romano O, Bicciato S. Computational methods for the integrative analysis of single-cell data. Brief Bioinform 2021; 22:20-29. [PMID: 32363378 PMCID: PMC7820847 DOI: 10.1093/bib/bbaa042] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 03/05/2020] [Accepted: 01/03/2020] [Indexed: 01/05/2023] Open
Abstract
Recent advances in single-cell technologies are providing exciting opportunities for dissecting tissue heterogeneity and investigating cell identity, fate and function. This is a pristine, exploding field that is flooding biologists with a new wave of data, each with its own specificities in terms of complexity and information content. The integrative analysis of genomic data, collected at different molecular layers from diverse cell populations, holds promise to address the full-scale complexity of biological systems. However, the combination of different single-cell genomic signals is computationally challenging, as these data are intrinsically heterogeneous for experimental, technical and biological reasons. Here, we describe the computational methods for the integrative analysis of single-cell genomic data, with a focus on the integration of single-cell RNA sequencing datasets and on the joint analysis of multimodal signals from individual cells.
Collapse
Affiliation(s)
- Mattia Forcato
- Molecular Biology and Bioinformatics at the University of Modena and Reggio Emilia. His research interests include the development and application of bioinformatics methods for the analysis of next-generation sequencing data
| | - Oriana Romano
- Molecular Biology and Bioinformatics at the University of Modena and Reggio Emilia. Her research activities are mainly focused on the integrative analysis of transcriptional and epigenomic bulk and single-cell data
| | - Silvio Bicciato
- Industrial Bioengineering at the University of Modena and Reggio Emilia. His research activity is the development and application of computational approaches for the analysis of multi -omics data
| |
Collapse
|
44
|
Amirmahani F, Ebrahimi N, Molaei F, Faghihkhorasani F, Jamshidi Goharrizi K, Mirtaghi SM, Borjian‐Boroujeni M, Hamblin MR. Approaches for the integration of big data in translational medicine: single‐cell and computational methods. Ann N Y Acad Sci 2021; 1493:3-28. [PMID: 33410160 DOI: 10.1111/nyas.14544] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 10/31/2020] [Accepted: 11/12/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Farzane Amirmahani
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Nasim Ebrahimi
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Fatemeh Molaei
- Department of Anesthesiology, Faculty of Paramedical Jahrom University of Medical Sciences Jahrom Iran
| | | | | | | | | | - Michael R. Hamblin
- Laser Research Centre, Faculty of Health Science University of Johannesburg South Africa
| |
Collapse
|
45
|
Li Y, Xu Q, Wu D, Chen G. Exploring Additional Valuable Information From Single-Cell RNA-Seq Data. Front Cell Dev Biol 2020; 8:593007. [PMID: 33335900 PMCID: PMC7736616 DOI: 10.3389/fcell.2020.593007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 10/26/2020] [Indexed: 12/28/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) technologies are broadly applied to dissect the cellular heterogeneity and expression dynamics, providing unprecedented insights into single-cell biology. Most of the scRNA-seq studies mainly focused on the dissection of cell types/states, developmental trajectory, gene regulatory network, and alternative splicing. However, besides these routine analyses, many other valuable scRNA-seq investigations can be conducted. Here, we first review cell-to-cell communication exploration, RNA velocity inference, identification of large-scale copy number variations and single nucleotide changes, and chromatin accessibility prediction based on single-cell transcriptomics data. Next, we discuss the identification of novel genes/transcripts through transcriptome reconstruction approaches, as well as the profiling of long non-coding RNAs and circular RNAs. Additionally, we survey the integration of single-cell and bulk RNA-seq datasets for deconvoluting the cell composition of large-scale bulk samples and linking single-cell signatures to patient outcomes. These additional analyses could largely facilitate corresponding basic science and clinical applications.
Collapse
Affiliation(s)
- Yunjin Li
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Qiyue Xu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Duojiao Wu
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
46
|
Baur B, Shin J, Zhang S, Roy S. Data integration for inferring context-specific gene regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:38-46. [PMID: 33225112 PMCID: PMC7676633 DOI: 10.1016/j.coisb.2020.09.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53715, USA
| |
Collapse
|
47
|
Hao Shi, Yan KK, Ding L, Qian C, Chi H, Yu J. Network Approaches for Dissecting the Immune System. iScience 2020; 23:101354. [PMID: 32717640 PMCID: PMC7390880 DOI: 10.1016/j.isci.2020.101354] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 06/21/2020] [Accepted: 07/08/2020] [Indexed: 02/06/2023] Open
Abstract
The immune system is a complex biological network composed of hierarchically organized genes, proteins, and cellular components that combat external pathogens and monitor the onset of internal disease. To meet and ultimately defeat these challenges, the immune system orchestrates an exquisitely complex interplay of numerous cells, often with highly specialized functions, in a tissue-specific manner. One of the major methodologies of systems immunology is to measure quantitatively the components and interaction levels in the immunologic networks to construct a computational network and predict the response of the components to perturbations. The recent advances in high-throughput sequencing techniques have provided us with a powerful approach to dissecting the complexity of the immune system. Here we summarize the latest progress in integrating omics data and network approaches to construct networks and to infer the underlying signaling and transcriptional landscape, as well as cell-cell communication, in the immune system, with a focus on hematopoiesis, adaptive immunity, and tumor immunology. Understanding the network regulation of immune cells has provided new insights into immune homeostasis and disease, with important therapeutic implications for inflammation, cancer, and other immune-mediated disorders.
Collapse
Affiliation(s)
- Hao Shi
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Koon-Kiu Yan
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Liang Ding
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Chenxi Qian
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Hongbo Chi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Jiyang Yu
- Departments of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
| |
Collapse
|
48
|
Philpott M, Cribbs AP, Brown T, Brown T, Oppermann U. Advances and challenges in epigenomic single-cell sequencing applications. Curr Opin Chem Biol 2020; 57:17-26. [PMID: 32304986 DOI: 10.1016/j.cbpa.2020.01.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 01/22/2020] [Indexed: 12/15/2022]
Abstract
Understanding multicellular physiology and pathobiology requires analysis of the relationship between genotype, chromatin organisation and phenotype. In the multi-omics era, many methods exist to investigate biological processes across the genome, transcriptome, epigenome, proteome and metabolome. Until recently, this was only possible for populations of cells or complex tissues, creating an averaging effect that may obscure direct correlations between multiple layers of data. Single-cell sequencing methods have removed this averaging effect, but computational integration after profiling distinct modalities separately may still not completely reflect underlying biology. Multiplexed assays resolving multiple modalities in the same cell are required to overcome these shortcomings and have the potential to deliver unprecedented understanding of biology and disease.
Collapse
Affiliation(s)
- Martin Philpott
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK
| | - Tom Brown
- ATDBio, Oxford Science Park, Robert Robinson Ave, Oxford, OX4 4GA, UK
| | - Tom Brown
- Department of Chemistry, University of Oxford, Oxford, OX1 3TF, UK
| | - Udo Oppermann
- Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, NIHR Oxford BRU, University of Oxford, OX3 7LD, UK.
| |
Collapse
|
49
|
Erbe R, Kessler MD, Favorov AV, Easwaran H, Gaykalova D, Fertig EJ. Matrix factorization and transfer learning uncover regulatory biology across multiple single-cell ATAC-seq data sets. Nucleic Acids Res 2020; 48:e68. [PMID: 32392348 PMCID: PMC7337516 DOI: 10.1093/nar/gkaa349] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/20/2020] [Accepted: 04/25/2020] [Indexed: 02/07/2023] Open
Abstract
While the methods available for single-cell ATAC-seq analysis are well optimized for clustering cell types, the question of how to integrate multiple scATAC-seq data sets and/or sequencing modalities is still open. We present an analysis framework that enables such integration across scATAC-seq data sets by applying the CoGAPS Matrix Factorization algorithm and the projectR transfer learning program to identify common regulatory patterns across scATAC-seq data sets. We additionally integrate our analysis with scRNA-seq data to identify orthogonal evidence for transcriptional regulators predicted by scATAC-seq analysis. Using publicly available scATAC-seq data, we find patterns that accurately characterize cell types both within and across data sets. Furthermore, we demonstrate that these patterns are both consistent with current biological understanding and reflective of novel regulatory biology.
Collapse
Affiliation(s)
- Rossin Erbe
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Alexander V Favorov
- Johns Hopkins University, Baltimore, MD, USA
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | | | | |
Collapse
|
50
|
Hu X, Hu Y, Wu F, Leung RWT, Qin J. Integration of single-cell multi-omics for gene regulatory network inference. Comput Struct Biotechnol J 2020; 18:1925-1938. [PMID: 32774787 PMCID: PMC7385034 DOI: 10.1016/j.csbj.2020.06.033] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 06/17/2020] [Accepted: 06/20/2020] [Indexed: 12/20/2022] Open
Abstract
The advancement of single-cell sequencing technology in recent years has provided an opportunity to reconstruct gene regulatory networks (GRNs) with the data from thousands of single cells in one sample. This uncovers regulatory interactions in cells and speeds up the discoveries of regulatory mechanisms in diseases and biological processes. Therefore, more methods have been proposed to reconstruct GRNs using single-cell sequencing data. In this review, we introduce technologies for sequencing single-cell genome, transcriptome, and epigenome. At the same time, we present an overview of current GRN reconstruction strategies utilizing different single-cell sequencing data. Bioinformatics tools were grouped by their input data type and mathematical principles for reader's convenience, and the fundamental mathematics inherent in each group will be discussed. Furthermore, the adaptabilities and limitations of these different methods will also be summarized and compared, with the hope to facilitate researchers recognizing the most suitable tools for them.
Collapse
Affiliation(s)
- Xinlin Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Yaohua Hu
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
| | - Fanjie Wu
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Ricky Wai Tak Leung
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|