1
|
Liu T, Li K, Wang Y, Li H, Zhao H. Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.555192. [PMID: 38464157 PMCID: PMC10925156 DOI: 10.1101/2023.09.08.555192] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs for single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. Overall, the top FMs include scGPT, Geneformer, and CellPLM by considering model performances and user accessibility among ten single-cell FMs. However, by comparing these FMs with task-specific methods, we found that single-cell FMs may not consistently excel than task-specific methods in all tasks, which challenges the necessity of developing foundation models for single-cell analysis. In addition, we evaluated the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning, to enhance the performances of single-cell FMs. Our work summarizes the current state of single-cell FMs, points to their constraints and avenues for future development, and offers a freely available evaluation pipeline to benchmark new models and improve method development.
Collapse
|
2
|
Nandi M. Emergence of temporal noise hierarchy in co-regulated genes of multi-output feed-forward loop. Phys Biol 2024; 22:016006. [PMID: 39591750 DOI: 10.1088/1478-3975/ad9792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 11/26/2024] [Indexed: 11/28/2024]
Abstract
Natural variations in gene expression, called noise, are fundamental to biological systems. The expression noise can be beneficial or detrimental to cellular functions. While the impact of noise on individual genes is well-established, our understanding of how noise behaves when multiple genes are co-expressed by shared regulatory elements within transcription networks remains elusive. This lack of understanding extends to how the architecture and regulatory features of these networks influence noise. To address this gap, we study the multi-output feed-forward loop motif. The motif is prevalent in bacteria and yeast and influences co-expression of multiple genes by shared transcription factors (TFs). Focusing on a two-output variant of the motif, the present study explores the interplay between its architecture, co-expression (symmetric and asymmetric) patterns of the two genes, and the associated noise dynamics. We employ a stochastic modeling approach to investigate how the binding affinities of the TFs influence symmetric and asymmetric expression patterns and the resulting noise dynamics in the co-expressed genes. This knowledge could guide the development of strategies for manipulating gene expression patterns through targeted modulation of TF binding affinities.
Collapse
Affiliation(s)
- Mintu Nandi
- Department of Chemistry, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India
| |
Collapse
|
3
|
Prater KE, Lin KZ. All the single cells: Single-cell transcriptomics/epigenomics experimental design and analysis considerations for glial biologists. Glia 2024. [PMID: 39558887 DOI: 10.1002/glia.24633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/18/2024] [Accepted: 10/10/2024] [Indexed: 11/20/2024]
Abstract
Single-cell transcriptomics, epigenomics, and other 'omics applied at single-cell resolution can significantly advance hypotheses and understanding of glial biology. Omics technologies are revealing a large and growing number of new glial cell subtypes, defined by their gene expression profile. These subtypes have significant implications for understanding glial cell function, cell-cell communications, and glia-specific changes between homeostasis and conditions such as neurological disease. For many, the training in how to analyze, interpret, and understand these large datasets has been through reading and understanding literature from other fields like biostatistics. Here, we provide a primer for glial biologists on experimental design and analysis of single-cell RNA-seq datasets. Our goal is to further the understanding of why decisions are made about datasets and to enhance biologists' ability to interpret and critique their work and the work of others. We review the steps involved in single-cell analysis with a focus on decision points and particular notes for glia. The goal of this primer is to ensure that single-cell 'omics experiments continue to advance glial biology in a rigorous and replicable way.
Collapse
Affiliation(s)
- Katherine E Prater
- Department of Neurology, School of Medicine, University of Washington, Seattle, Washington, USA
| | - Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
4
|
Shan X, Zhao H. Inferring Cell-Type-Specific Co-Expressed Genes from Single Cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622700. [PMID: 39605403 PMCID: PMC11601408 DOI: 10.1101/2024.11.08.622700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Background Cell-type-specific gene co-expression networks are widely used to characterize gene relationships. Although many methods have been developed to infer such co-expression networks from single-cell data, the lack of consideration of false positive control in many evaluations may lead to incorrect conclusions because higher reproducibility, higher functional coherence, and a larger overlap with known biological networks may not imply better performance if the false positives are not well controlled. Results In this study, we have developed an efficient and effective simulation tool to derive empirical p-values in co-expression inference to appropriately control false positives in assessing method performance. We studied the power of the p-value-based approach in inferring cell-type-specific co-expressions from single-cell data using both simulated and real data. We also highlight the need to adjust for random overlaps between the inferred and known networks when the number of selected correlated gene pairs varies substantially across different methods. We further illustrate the expression level bias in known biological networks and the impact of such bias in method assessment. Conclusion Our study indicates the importance of controlling false positives in the inference of co-expressed genes to achieve more reliable results and proposes a simulation-based p-value method to achieve this.
Collapse
|
5
|
Su C, Lee D, Jin P, Zhang J. Cell-type-specific mapping of enhancers and target genes from single-cell multimodal data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614814. [PMID: 39386519 PMCID: PMC11463474 DOI: 10.1101/2024.09.24.614814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Mapping enhancers and target genes in disease-related cell types has provided critical insights into the functional mechanisms of genetic variants identified by genome-wide association studies (GWAS). However, most existing analyses rely on bulk data or cultured cell lines, which may fail to identify cell-type-specific enhancers and target genes. Recently, single-cell multimodal data measuring both gene expression and chromatin accessibility within the same cells have enabled the inference of enhancer-gene pairs in a cell-type-specific and context-specific manner. However, this task is challenged by the data's high sparsity, sequencing depth variation, and the computational burden of analyzing a large number of enhancer-gene pairs. To address these challenges, we propose scMultiMap, a statistical method that infers enhancer-gene association from sparse multimodal counts using a joint latent-variable model. It adjusts for technical confounding, permits fast moment-based estimation and provides analytically derived p -values. In systematic analyses of blood and brain data, scMultiMap shows appropriate type I error control, high statistical power with greater reproducibility across independent datasets and stronger consistency with orthogonal data modalities. Meanwhile, its computational cost is less than 1% of existing methods. When applied to single-cell multimodal data from postmortem brain samples from Alzheimer's disease (AD) patients and controls, scMultiMap gave the highest heritability enrichment in microglia and revealed new insights into the regulatory mechanisms of AD GWAS variants in microglia.
Collapse
Affiliation(s)
- Chang Su
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Dongsoo Lee
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA, USA
| |
Collapse
|
6
|
Baltsavia I, Oulas A, Theodosiou T, Lavigne MD, Andreakos E, Mavrothalassitis G, Iliopoulos I. scRNA-Explorer: An End-user Online Tool for Single Cell RNA-seq Data Analysis Featuring Gene Correlation and Data Filtering. J Mol Biol 2024; 436:168654. [PMID: 39237193 DOI: 10.1016/j.jmb.2024.168654] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 05/20/2024] [Accepted: 06/07/2024] [Indexed: 09/07/2024]
Abstract
In the majority of downstream analysis pipelines for single-cell RNA sequencing (scRNA-seq), techniques like dimensionality reduction and feature selection are employed to address the problem of high-dimensional nature of the data. These approaches involve mapping the data onto a lower-dimensional space, eliminating less informative genes, and pinpointing the most pertinent features. This process ultimately leads to a reduction in the number of dimensions used for downstream analysis, which in turn speeds up the computation of large-scale scRNA-seq data. Most approaches are directed to isolate from biological background the genes characterizing different cells and or the condition under study by establishing lists of differentially expressed or coexpressed genes. Herein, we present scRNA-Explorer an open-source online tool for simplified and rapid scRNA-seq analysis designed with the end user in mind. scRNA-Explorer utilizes: (i) Filtering out uninformative cells in an interactive manner via a web interface, (ii) Gene correlation analysis coupled with an extra step of evaluating the biological importance of these correlations, and (iii) Gene enrichment analysis of correlated genes in order to find gene implication in specific functions. We developed a pipeline to address the above problem. The scRNA-Explorer pipeline allows users to interrogate in an interactive manner scRNA-sequencing data sets to explore via gene expression correlations possible function(s) of a gene of interest. scRNA-Explorer can be accessed at https://bioinformatics.med.uoc.gr/shinyapps/app/scrnaexplorer.
Collapse
Affiliation(s)
- Ismini Baltsavia
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Anastasis Oulas
- Cyprus Institute of Neurology and Genetics, Bioinformatics Department, P.O.Box 23462, 1683 Nicosia, Cyprus
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | | | - Evangelos Andreakos
- Center for Immunology and Transplantation, Biomedical Research Foundation Academy of Athens, Athens, Greece
| | - George Mavrothalassitis
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece; IMBB, FORTH, 71003 Heraklion, Crete, Greece.
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece.
| |
Collapse
|
7
|
Nemsick S, Hansen AS. Molecular models of bidirectional promoter regulation. Curr Opin Struct Biol 2024; 87:102865. [PMID: 38905929 PMCID: PMC11550790 DOI: 10.1016/j.sbi.2024.102865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/30/2024] [Accepted: 05/27/2024] [Indexed: 06/23/2024]
Abstract
Approximately 11% of human genes are transcribed by a bidirectional promoter (BDP), defined as two genes with <1 kb between their transcription start sites. Despite their evolutionary conservation and enrichment for housekeeping genes and oncogenes, the regulatory role of BDPs remains unclear. BDPs have been suggested to facilitate gene coregulation and/or decrease expression noise. This review discusses these potential regulatory functions through the context of six prospective underlying mechanistic models: a single nucleosome free region, shared transcription factor/regulator binding, cooperative negative supercoiling, bimodal histone marks, joint activation by enhancer(s), and RNA-mediated recruitment of regulators. These molecular mechanisms may act independently and/or cooperatively to facilitate the coregulation and/or decreased expression noise predicted of BDPs.
Collapse
Affiliation(s)
- Sarah Nemsick
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; The Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA
| | - Anders S Hansen
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; The Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA.
| |
Collapse
|
8
|
Burdett NL, Willis MO, Pandey A, Twomey L, Alaei S, Bowtell DDL, Christie EL. Timing of whole genome duplication is associated with tumor-specific MHC-II depletion in serous ovarian cancer. Nat Commun 2024; 15:6069. [PMID: 39025846 PMCID: PMC11258338 DOI: 10.1038/s41467-024-50137-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 07/02/2024] [Indexed: 07/20/2024] Open
Abstract
Whole genome duplication is frequently observed in cancer, and its prevalence in our prior analysis of end-stage, homologous recombination deficient high grade serous ovarian cancer (almost 80% of samples) supports the notion that whole genome duplication provides a fitness advantage under the selection pressure of therapy. Here, we therefore aim to identify potential therapeutic vulnerabilities in primary high grade serous ovarian cancer with whole genome duplication by assessing differentially expressed genes and pathways in 79 samples. We observe that MHC-II expression is lowest in tumors which have acquired whole genome duplication early in tumor evolution, and further demonstrate that reduced MHC-II expression occurs in subsets of tumor cells rather than in canonical antigen-presenting cells. Early whole genome duplication is also associated with worse patient survival outcomes. Our results suggest an association between the timing of whole genome duplication, MHC-II expression and clinical outcome in high grade serous ovarian cancer that warrants further investigation for therapeutic targeting.
Collapse
Affiliation(s)
- Nikki L Burdett
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, 3010, Australia
- Box Hill Hospital, Eastern Health, Box Hill, VIC, 3128, Australia
| | | | - Ahwan Pandey
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia
| | - Laura Twomey
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia
| | - Sara Alaei
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia
- Australian Regenerative Medicine Institute, Monash University, Clayton, VIC, 3168, Australia
| | - David D L Bowtell
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Elizabeth L Christie
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
9
|
Dai R, Zhang M, Chu T, Kopp R, Zhang C, Liu K, Wang Y, Wang X, Chen C, Liu C. Precision and Accuracy of Single-Cell/Nuclei RNA Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589216. [PMID: 38659857 PMCID: PMC11042208 DOI: 10.1101/2024.04.12.589216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Single-cell/nuclei RNA sequencing (sc/snRNA-Seq) is widely used for profiling cell-type gene expressions in biomedical research. An important but underappreciated issue is the quality of sc/snRNA-Seq data that would impact the reliability of downstream analyses. Here we evaluated the precision and accuracy in 18 sc/snRNA-Seq datasets. The precision was assessed on data from human brain studies with a total of 3,483,905 cells from 297 individuals, by utilizing technical replicates. The accuracy was evaluated with sample-matched scRNA-Seq and pooled-cell RNA-Seq data of cultured mononuclear phagocytes from four species. The results revealed low precision and accuracy at the single-cell level across all evaluated data. Cell number and RNA quality were highlighted as two key factors determining the expression precision, accuracy, and reproducibility of differential expression analysis in sc/snRNA-Seq. This study underscores the necessity of sequencing enough high-quality cells per cell type per individual, preferably in the hundreds, to mitigate noise in expression quantification.
Collapse
Affiliation(s)
- Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Ming Zhang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Tianyao Chu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Richard Kopp
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Chunling Zhang
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Kefu Liu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, VA, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Chao Chen
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Hunan Key Laboratory of Animal Models for Human Diseases, Central South University, Changsha, China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
10
|
Wu Y, Gu Q, Wang Z, Tian Z, Wang Z, Liu W, Han J, Liu S. Electrochemiluminescence Analysis of Multiple Glycans on Single Living Cell with a Closed Bipolar Electrode Array Chip. Anal Chem 2024; 96:2165-2172. [PMID: 38284353 DOI: 10.1021/acs.analchem.3c05127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
The profiling of multiple glycans on a single cell is important for elucidating glycosylation mechanisms and accurately identifying disease states. Herein, we developed a closed bipolar electrode (BPE) array chip for live single-cell trapping and in situ galactose and sialic acid detection with the electrochemiluminescence (ECL) method. Methylene blue-DNA (MB-DNA) as well as biotin-DNA (Bio-DNA) codecorated AuNPs were prepared as nanoprobes, which were selectively labeled on the cell surface through chemoselective labeling techniques. The individual cell was captured and labeled in the microtrap of the cathodic chamber, under an appropriate potential, MB molecules on the cellular membrane underwent oxidation, triggering the reduction of [Ru(bpy)3]2+/TPA and consequently generating ECL signals in the anodic chamber. The abundance of MB groups on the single cell enabled selective monitoring of both sialic acid and galactosyl groups with high sensitivity using ECL. The sialic acid and galactosyl content per HepG2 cell were detected to be 0.66 and 0.82 fmol, respectively. Through comprehensive evaluation of these two types of glycans on a single cell, tumor cells, and normal cells could be effectively discriminated and the accuracy of single-cell heterogeneous analysis was improved. Additionally, dynamic monitoring of variations in galactosyl groups on the surface of the single cell was also achieved. This work introduced a straightforward and convenient approach for heterogeneity analysis among single cells.
Collapse
Affiliation(s)
- Yafeng Wu
- Jiangsu Engineering Laboratory of Smart Carbon-Rich Materials and Device, State Key Laboratory of Digital Medical Engineering, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, China
| | - Qinglin Gu
- Jiangsu Engineering Laboratory of Smart Carbon-Rich Materials and Device, State Key Laboratory of Digital Medical Engineering, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, China
| | - Zhi Wang
- Wuxi Institute of Inspection, Testing and Certification, Wuxi 214125, China
| | - Zhaoyan Tian
- School of Pharmaceutical Sciences, Liaocheng University, Liaocheng 252059, China
| | - Zhaohan Wang
- Jiangsu Engineering Laboratory of Smart Carbon-Rich Materials and Device, State Key Laboratory of Digital Medical Engineering, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, China
| | - Weiwei Liu
- Jiangsu Engineering Laboratory of Smart Carbon-Rich Materials and Device, State Key Laboratory of Digital Medical Engineering, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, China
| | - Jianyu Han
- School of Energy and Environment, Southeast University, Nanjing 211189, China
| | - Songqin Liu
- Jiangsu Engineering Laboratory of Smart Carbon-Rich Materials and Device, State Key Laboratory of Digital Medical Engineering, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, China
| |
Collapse
|
11
|
Su C, Zhang J, Zhao H. Estimating cell-type-specific gene co-expression networks from bulk gene expression data with an application to Alzheimer's disease. J Am Stat Assoc 2024; 119:811-824. [PMID: 39280354 PMCID: PMC11394578 DOI: 10.1080/01621459.2023.2297467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 12/13/2023] [Indexed: 09/18/2024]
Abstract
Inferring and characterizing gene co-expression networks has led to important insights on the molecular mechanisms of complex diseases. Most co-expression analyses to date have been performed on gene expression data collected from bulk tissues with different cell type compositions across samples. As a result, the co-expression estimates only offer an aggregated view of the underlying gene regulations and can be confounded by heterogeneity in cell type compositions, failing to reveal gene coordination that may be distinct across different cell types. In this paper, we introduce a flexible framework for estimating cell-type-specific gene co-expression networks from bulk sample data, without making specific assumptions on the distributions of gene expression profiles in different cell types. We develop a novel sparse least squares estimator, referred to as CSNet, that is efficient to implement and has good theoretical properties. Using CSNet, we analyzed the bulk gene expression data from a cohort study on Alzheimer's disease and identified previously unknown cell-type-specific co-expressions among Alzheimer's disease risk genes, suggesting cell-type-specific disease mechanisms.
Collapse
Affiliation(s)
- Chang Su
- Department of Biostatistics and Bioinformatics, Emory University
- Department of Biostatistics, Yale University
| | - Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|