1
|
Kissel LT, Pochareddy S, An JY, Sestan N, Sanders SJ, Wang X, Werling DM. Sex-Differential Gene Expression in Developing Human Cortex and Its Intersection With Autism Risk Pathways. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2024; 4:100321. [PMID: 38957312 PMCID: PMC11217612 DOI: 10.1016/j.bpsgos.2024.100321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 07/04/2024] Open
Abstract
Background Sex-differential biology may contribute to the consistently male-biased prevalence of autism spectrum disorder (ASD). Gene expression differences between males and females in the brain can indicate possible molecular and cellular mechanisms involved, although transcriptomic sex differences during human prenatal cortical development have been incompletely characterized, primarily due to small sample sizes. Methods We performed a meta-analysis of sex-differential expression and co-expression network analysis in 2 independent bulk RNA sequencing datasets generated from cortex of 273 prenatal donors without known neuropsychiatric disorders. To assess the intersection between neurotypical sex differences and neuropsychiatric disorder biology, we tested for enrichment of ASD-associated risk genes and expression changes, neuropsychiatric disorder risk genes, and cell type markers within identified sex-differentially expressed genes (sex-DEGs) and sex-differential co-expression modules. Results We identified 101 significant sex-DEGs, including Y-chromosome genes, genes impacted by X-chromosome inactivation, and autosomal genes. Known ASD risk genes, implicated by either common or rare variants, did not preferentially overlap with sex-DEGs. We identified 1 male-specific co-expression module enriched for immune signaling that is unique to 1 input dataset. Conclusions Sex-differential gene expression is limited in prenatal human cortex tissue, although meta-analysis of large datasets allows for the identification of sex-DEGs, including autosomal genes that encode proteins involved in neural development. Lack of sex-DEG overlap with ASD risk genes in the prenatal cortex suggests that sex-differential modulation of ASD symptoms may occur in other brain regions, at other developmental stages, or in specific cell types, or may involve mechanisms that act downstream from mutation-carrying genes.
Collapse
Affiliation(s)
- Lee T. Kissel
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, Wisconsin
| | - Sirisha Pochareddy
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, Connecticut
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, Republic of Korea
- Transdisciplinary Major in Learning Health Systems, Department of Healthcare Sciences, Graduate School, Korea University, Seoul, Republic of Korea
- BK21FOUR R&E Center for Learning Health Systems, Korea University, Seoul, Republic of Korea
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, Connecticut
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, United Kingdom
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, California
| | - Xuran Wang
- Seaver Autism Center for Research and Treatment, New York, New York
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Donna M. Werling
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
2
|
Tiong KL, Luzhbin D, Yeang CH. Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data. BMC Bioinformatics 2024; 25:209. [PMID: 38867193 PMCID: PMC11167951 DOI: 10.1186/s12859-024-05825-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 06/03/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation. RESULTS We selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data. CONCLUSIONS The results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Dmytro Luzhbin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
3
|
Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X, Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M. Single-cell genomics and regulatory networks for 388 human brains. Science 2024; 384:eadi5199. [PMID: 38781369 DOI: 10.1126/science.adi5199] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 04/05/2024] [Indexed: 05/25/2024]
Abstract
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Collapse
Affiliation(s)
- Prashant S Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Jason J Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Declan Clarke
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Matthew Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Chirag Gupta
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Che Yu Lee
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Siwei Xu
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yuhang Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Zhiyuan Chu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Ahyeon Hwang
- Department of Computer Science, University of California, Irvine, CA 92697, USA
- Mathematical, Computational and Systems Biology, University of California, Irvine, CA 92697, USA
| | - Yunyang Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Xiao Zhou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lucy Bicks
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Tanima Chatterjee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | - Yuyan Cheng
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Ophthalmology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yi Dai
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Ziheng Duan
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | | | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael Gancz
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Diego Garrido-Martín
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Barcelona 08028, Spain
| | - Sophia Gaynor-Gillett
- Tempus Labs, Chicago, IL 60654, USA
- Department of Biology, Cornell College, Mount Vernon, IA 52314, USA
| | - Jennifer Grundman
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Natalie Hawken
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Ella Henry
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Ao Huang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Ting Jin
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | | | - Riki Kawaguchi
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA 90095, USA
| | - Saniya Khullar
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Jianyin Liu
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Junhao Liu
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Shuang Liu
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Shaojie Ma
- Department of Neuroscience, Yale University, New Haven, CT 06510, USA
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | - Samantha Mazariegos
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | - Eric Nguyen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Milos Pjanic
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Henry Pratt
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Diana Quintero
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | | | - Tiernon R Riesenmy
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | | | - Rosemarie Terwilliger
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520, USA
| | | | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Gaoyuan Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yan Xia
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Shaohua Xiao
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Andrew C Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Suchen Zheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA 98109, USA
- Department of Neurological Surgery, University of Washington, Seattle, WA 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY 10468, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale University, New Haven, CT 06510, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kevin P White
- Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520, USA
- Wu Tsai Institute, Yale University, New Haven, CT 06520, USA
- Clinical Neuroscience Division, National Center for Posttraumatic Stress Disorder, Veterans Affairs Connecticut Healthcare System, West Haven, CT 06516, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Daniel Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
4
|
Dai R, Chu T, Zhang M, Wang X, Jourdon A, Wu F, Mariani J, Vaccarino FM, Lee D, Fullard JF, Hoffman GE, Roussos P, Wang Y, Wang X, Pinto D, Wang SH, Zhang C, Chen C, Liu C. Evaluating performance and applications of sample-wise cell deconvolution methods on human brain transcriptomic data. SCIENCE ADVANCES 2024; 10:eadh2588. [PMID: 38781336 PMCID: PMC11114236 DOI: 10.1126/sciadv.adh2588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 01/05/2024] [Indexed: 05/25/2024]
Abstract
Sample-wise deconvolution methods estimate cell-type proportions and gene expressions in bulk tissue samples, yet their performance and biological applications remain unexplored, particularly in human brain transcriptomic data. Here, nine deconvolution methods were evaluated with sample-matched data from bulk tissue RNA sequencing (RNA-seq), single-cell/nuclei (sc/sn) RNA-seq, and immunohistochemistry. A total of 1,130,767 nuclei per cells from 149 adult postmortem brains and 72 organoid samples were used. The results showed the best performance of dtangle for estimating cell proportions and bMIND for estimating sample-wise cell-type gene expressions. For eight brain cell types, 25,273 cell-type eQTLs were identified with deconvoluted expressions (decon-eQTLs). The results showed that decon-eQTLs explained more schizophrenia GWAS heritability than bulk tissue or single-cell eQTLs did alone. Differential gene expressions associated with Alzheimer's disease, schizophrenia, and brain development were also examined using the deconvoluted data. Our findings, which were replicated in bulk tissue and single-cell data, provided insights into the biological applications of deconvoluted data in multiple brain disorders.
Collapse
Affiliation(s)
- Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Tianyao Chu
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ming Zhang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Xuan Wang
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | | | - Feinan Wu
- Child Study Center, Yale University, New Haven, CT, USA
| | | | - Flora M. Vaccarino
- Child Study Center, Yale University, New Haven, CT, USA
- Department of Neuroscience, Yale University, New Haven, CT, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F. Fullard
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabriel E. Hoffman
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Xusheng Wang
- Department of Biology, University of North Dakota, Grand Forks, ND, USA
| | - Dalila Pinto
- Departments of Psychiatry and Genetics and Genomic Sciences, Mindich Child Health and Development Institute, and Icahn Genomics Institute for Data Science and Genomic Technology, Seaver Autism Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sidney H. Wang
- Center for Human Genetics, The Brown foundation Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Chunling Zhang
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | | | - Chao Chen
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- MOE Key Laboratory of Rare Pediatric Diseases & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, and Department of Psychiatry, The Second Xiangya Hospital, Central South University, Changsha, China
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
5
|
Tang C, Sun Q, Zeng X, Yang X, Liu F, Zhao J, Shen Y, Liu B, Wen J, Li Y. Cell-type specific inference from bulk RNA-sequencing data by integrating single cell reference profiles via EPIC-unmix. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595514. [PMID: 38826297 PMCID: PMC11142188 DOI: 10.1101/2024.05.23.595514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Cell type specific (CTS) analysis is essential to reveal biological insights obscured in bulk tissue data. However, single-cell (sc) or single-nuclei (sn) resolution data are still cost-prohibitive for large-scale samples. Thus, computational methods to perform deconvolution from bulk tissue data are highly valuable. We here present EPIC-unmix, a novel two-step empirical Bayesian method integrating reference sc/sn RNA-seq data and bulk RNA-seq data from target samples to enhance the accuracy of CTS inference. We demonstrate through comprehensive simulations across three tissues that EPIC-unmix achieved 4.6% - 109.8% higher accuracy compared to alternative methods. By applying EPIC-unmix to human bulk brain RNA-seq data from the ROSMAP and MSBB cohorts, we identified multiple genes differentially expressed between Alzheimer's disease (AD) cases versus controls in a CTS manner, including 57.4% novel genes not identified using similar sample size sc/snRNA-seq data, indicating the power of our in-silico approach. Among the 6-69% overlapping, 83%-100% are in consistent direction with those from sc/snRNA-seq data, supporting the reliability of our findings. EPIC-unmix inferred CTS expression profiles similarly empowers CTS eQTL analysis. Among the novel eQTLs, we highlight a microglia eQTL for AD risk gene AP3B2, obscured in bulk and missed by sc/snRNA-seq based eQTL analysis. The variant resides in a microglia-specific cCRE, forming chromatin loop with AP3B2 promoter region in microglia. Taken together, we believe EPIC-unmix will be a valuable tool to enable more powerful CTS analysis.
Collapse
Affiliation(s)
- Chenwei Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xinyue Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xiaoyu Yang
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Fei Liu
- Department of Pharmacy and Pharmaceutical Sciences, Faculty of Science, National University of Singapore, Singapore
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA
| | - Yin Shen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Bixiang Liu
- Department of Pharmacy and Pharmaceutical Sciences, Faculty of Science, National University of Singapore, Singapore
- Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
6
|
Collier L, Seah C, Hicks EM, Holtzheimer PE, Krystal JH, Girgenti MJ, Huckins LM, Johnston KJA. The impact of chronic pain on brain gene expression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.20.24307630. [PMID: 38826319 PMCID: PMC11142271 DOI: 10.1101/2024.05.20.24307630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Background Chronic pain affects one fifth of American adults, contributing significant public health burden. Chronic pain mechanisms can be further understood through investigating brain gene expression. Methods We tested differentially expressed genes (DEGs) in chronic pain, migraine, lifetime fentanyl and oxymorphone use, and with chronic pain genetic risk in four brain regions (dACC, DLPFC, MeA, BLA) and imputed cell type expression data from 304 postmortem donors. We compared findings across traits and with independent transcriptomics resources, and performed gene-set enrichment. Results We identified two chronic pain DEGs: B4GALT and VEGFB in bulk dACC. We found over 2000 (primarily BLA microglia) chronic pain cell type DEGs. Findings were enriched for mouse microglia pain genes, and for hypoxia and immune response. Cross-trait DEG overlap was minimal. Conclusions Chronic pain-associated gene expression is heterogeneous across cell type, largely distinct from that in pain-related traits, and shows BLA microglia are a key cell type.
Collapse
Affiliation(s)
- Lily Collier
- Department of Biological Sciences, Columbia University, New York City, NY
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University, New Haven, CT
| | - Carina Seah
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Emily M Hicks
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Paul E Holtzheimer
- National Center for PTSD, U.S. Department of Veterans Affairs
- Department of Psychiatry, Geisel School of Medicine at Dartmouth, Lebanon, NH 03756, USA
| | - John H Krystal
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University, New Haven, CT
- Clinical Neuroscience Division, National Center for PTSD, VA Connecticut Healthcare System, West Haven, CT
| | - Matthew J Girgenti
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University, New Haven, CT
- Clinical Neuroscience Division, National Center for PTSD, VA Connecticut Healthcare System, West Haven, CT
| | - Laura M Huckins
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University, New Haven, CT
| | - Keira J A Johnston
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University, New Haven, CT
| |
Collapse
|
7
|
Khatri R, Machart P, Bonn S. DISSECT: deep semi-supervised consistency regularization for accurate cell type fraction and gene expression estimation. Genome Biol 2024; 25:112. [PMID: 38689377 PMCID: PMC11061925 DOI: 10.1186/s13059-024-03251-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 04/17/2024] [Indexed: 05/02/2024] Open
Abstract
Cell deconvolution is the estimation of cell type fractions and cell type-specific gene expression from mixed data. An unmet challenge in cell deconvolution is the scarcity of realistic training data and the domain shift often observed in synthetic training data. Here, we show that two novel deep neural networks with simultaneous consistency regularization of the target and training domains significantly improve deconvolution performance. Our algorithm, DISSECT, outperforms competing algorithms in cell fraction and gene expression estimation by up to 14 percentage points. DISSECT can be easily adapted to other biomedical data types, as exemplified by our proteomic deconvolution experiments.
Collapse
Affiliation(s)
- Robin Khatri
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Pierre Machart
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| |
Collapse
|
8
|
Huuki-Myers LA, Montgomery KD, Kwon SH, Cinquemani S, Eagles NJ, Gonzalez-Padilla D, Maden SK, Kleinman JE, Hyde TM, Hicks SC, Maynard KR, Collado-Torres L. Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.09.579665. [PMID: 38405805 PMCID: PMC10888823 DOI: 10.1101/2024.02.09.579665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Background Cellular deconvolution of bulk RNA-sequencing (RNA-seq) data using single cell or nuclei RNA-seq (sc/snRNA-seq) reference data is an important strategy for estimating cell type composition in heterogeneous tissues, such as human brain. Computational methods for deconvolution have been developed and benchmarked against simulated data, pseudobulked sc/snRNA-seq data, or immunohistochemistry reference data. A major limitation in developing improved deconvolution algorithms has been the lack of integrated datasets with orthogonal measurements of gene expression and estimates of cell type proportions on the same tissue sample. Deconvolution algorithm performance has not yet been evaluated across different RNA extraction methods (cytosolic, nuclear, or whole cell RNA), different library preparation types (mRNA enrichment vs. ribosomal RNA depletion), or with matched single cell reference datasets. Results A rich multi-assay dataset was generated in postmortem human dorsolateral prefrontal cortex (DLPFC) from 22 tissue blocks. Assays included spatially-resolved transcriptomics, snRNA-seq, bulk RNA-seq (across six library/extraction RNA-seq combinations), and RNAScope/Immunofluorescence (RNAScope/IF) for six broad cell types. The Mean Ratio method, implemented in the DeconvoBuddies R package, was developed for selecting cell type marker genes. Six computational deconvolution algorithms were evaluated in DLPFC and predicted cell type proportions were compared to orthogonal RNAScope/IF measurements. Conclusions Bisque and hspe were the most accurate methods, were robust to differences in RNA library types and extractions. This multi-assay dataset showed that cell size differences, marker genes differentially quantified across RNA libraries, and cell composition variability in reference snRNA-seq impact the accuracy of current deconvolution methods.
Collapse
Affiliation(s)
- Louise A. Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Kelsey D. Montgomery
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Sophia Cinquemani
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Nicholas J. Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | | | - Sean K. Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Joel E. Kleinman
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Thomas M. Hyde
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21205, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Kristen R. Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
| |
Collapse
|
9
|
Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X, Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M. Single-cell genomics and regulatory networks for 388 human brains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585576. [PMID: 38562822 PMCID: PMC10983939 DOI: 10.1101/2024.03.18.585576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Collapse
Affiliation(s)
- Prashant S Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Jason J Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Declan Clarke
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Matthew Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Chirag Gupta
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Che Yu Lee
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Siwei Xu
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Cagatay Dursun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Yuhang Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Zhiyuan Chu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Ahyeon Hwang
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
- Mathematical, Computational and Systems Biology, University of California, Irvine, CA, 92697, USA
| | - Yunyang Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Xiao Zhou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | | | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Lucy Bicks
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Tanima Chatterjee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | | | - Yuyan Cheng
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Opthalmology, Perlman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yi Dai
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Ziheng Duan
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | | | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Michael Gancz
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Diego Garrido-Martín
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Barcelona, 08028, Spain
| | - Sophia Gaynor-Gillett
- Tempus Labs, Inc., Chicago, IL, 60654, USA
- Department of Biology, Cornell College, Mount Vernon, IA, 52314, USA
| | - Jennifer Grundman
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Natalie Hawken
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Ella Henry
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
| | - Ao Huang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Ting Jin
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | | | - Riki Kawaguchi
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA, 90095, USA
| | - Saniya Khullar
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Jianyin Liu
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Junhao Liu
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Shuang Liu
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Shaojie Ma
- Department of Neuroscience, Yale University, New Haven, CT, 06510, USA
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Michael Margolis
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Samantha Mazariegos
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | | | - Eric Nguyen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Milos Pjanic
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Henry Pratt
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Diana Quintero
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | | | - Tiernon R Riesenmy
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Manman Shi
- Tempus Labs, Inc., Chicago, IL, 60654, USA
| | | | - Rosemarie Terwilliger
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06520, USA
| | | | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Gaoyuan Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Yan Xia
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaohua Xiao
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Andrew C Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Suchen Zheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA, 98109, USA
- Department of Neurological Surgery, University of Washington, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research Education and Clinical Center, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Precision Medicine and Translational Therapeutics, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale University, New Haven, CT, 06510, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Kevin P White
- Yong Loo Lin School of Medicine, National University of Singapore, 117597, Singapore
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06520, USA
- Wu Tsai Institute, Yale University, New Haven, CT, 06520, USA
- Clinical Neuroscience Division, National Center for Posttraumatic Stress Disorder, Veterans Affairs Connecticut Healthcare System, West Haven, CT, 06516, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA, 92697, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Daniel Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Center for Autism Research and Treatment, Semel Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
- Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT, 06520, USA
| |
Collapse
|
10
|
Youssef A, Paul I, Crovella M, Emili A. DESP demixes cell-state profiles from dynamic bulk molecular measurements. CELL REPORTS METHODS 2024; 4:100729. [PMID: 38490205 PMCID: PMC10985230 DOI: 10.1016/j.crmeth.2024.100729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 12/22/2023] [Accepted: 02/16/2024] [Indexed: 03/17/2024]
Abstract
Understanding the dynamic expression of proteins and other key molecules driving phenotypic remodeling in development and pathobiology has garnered widespread interest, yet the exploration of these systems at the foundational resolution of the underlying cell states has been significantly limited by technical constraints. Here, we present DESP, an algorithm designed to leverage independent estimates of cell-state proportions, such as from single-cell RNA sequencing, to resolve the relative contributions of cell states to bulk molecular measurements, most notably quantitative proteomics, recorded in parallel. We applied DESP to an in vitro model of the epithelial-to-mesenchymal transition and demonstrated its ability to accurately reconstruct cell-state signatures from bulk-level measurements of both the proteome and transcriptome, providing insights into transient regulatory mechanisms. DESP provides a generalizable computational framework for modeling the relationship between bulk and single-cell molecular measurements, enabling the study of proteomes and other molecular profiles at the cell-state level using established bulk-level workflows.
Collapse
Affiliation(s)
- Ahmed Youssef
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Indranil Paul
- Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Mark Crovella
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Computer Science Department, Boston University, Boston, MA, USA; Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA.
| | - Andrew Emili
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA; Center for Network Systems Biology, Boston University, Boston, MA, USA; Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA; Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA.
| |
Collapse
|
11
|
Boltz T, Schwarz T, Bot M, Hou K, Caggiano C, Lapinska S, Duan C, Boks MP, Kahn RS, Zaitlen N, Pasaniuc B, Ophoff R. Cell-type deconvolution of bulk-blood RNA-seq reveals biological insights into neuropsychiatric disorders. Am J Hum Genet 2024; 111:323-337. [PMID: 38306997 PMCID: PMC10870131 DOI: 10.1016/j.ajhg.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 02/04/2024] Open
Abstract
Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.
Collapse
Affiliation(s)
- Toni Boltz
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Merel Bot
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Christa Caggiano
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Chenda Duan
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Marco P Boks
- Department of Psychiatry, Brain Center, University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands
| | - Rene S Kahn
- Department of Psychiatry, Brain Center, University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands; Department of Psychiatry, Icahn School of Medicine, Mount Sinai, NY, USA
| | - Noah Zaitlen
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Neurology, University of California Los Angeles, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Roel Ophoff
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA; Department of Psychiatry, Erasmus University Medical Center, Rotterdam, the Netherlands.
| |
Collapse
|
12
|
Zhang L, Cascio S, Mellors JW, Buckanovich RJ, Osmanbeyoglu HU. Single-cell analysis reveals the stromal dynamics and tumor-specific characteristics in the microenvironment of ovarian cancer. Commun Biol 2024; 7:20. [PMID: 38182756 PMCID: PMC10770164 DOI: 10.1038/s42003-023-05733-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/20/2023] [Indexed: 01/07/2024] Open
Abstract
High-grade serous ovarian carcinoma (HGSOC) is a heterogeneous disease, and a highstromal/desmoplastic tumor microenvironment (TME) is associated with a poor outcome. Stromal cell subtypes, including fibroblasts, myofibroblasts, and cancer-associated mesenchymal stem cells, establish a complex network of paracrine signaling pathways with tumor-infiltrating immune cells that drive effector cell tumor immune exclusion and inhibit the antitumor immune response. In this work, we integrate single-cell transcriptomics of the HGSOC TME from public and in-house datasets (n = 20) and stratify tumors based upon high vs. low stromal cell content. Although our cohort size is small, our analyses suggest a distinct transcriptomic landscape for immune and non-immune cells in high-stromal vs. low-stromal tumors. High-stromal tumors have a lower fraction of certain T cells, natural killer (NK) cells, and macrophages, and increased expression of CXCL12 in epithelial cancer cells and cancer-associated mesenchymal stem cells (CA-MSCs). Analysis of cell-cell communication indicate that epithelial cancer cells and CA-MSCs secrete CXCL12 that interacte with the CXCR4 receptor, which is overexpressed on NK and CD8+ T cells. Dual IHC staining show that tumor infiltrating CD8 T cells localize in proximity of CXCL12+ tumor area. Moreover, CXCL12 and/or CXCR4 antibodies confirm the immunosuppressive role of CXCL12-CXCR4 in high-stromal tumors.
Collapse
Affiliation(s)
- Linan Zhang
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA
- Department of Applied Mathematics, School of Mathematics and Statistics, Ningbo University, Ningbo, Zhejiang, 315211, China
| | - Sandra Cascio
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA
- Magee-Womens Research Institute, Pittsburgh, PA, 15213, USA
- Division of Gynecologic Oncology, Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - John W Mellors
- Division of Infectious Diseases, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Ronald J Buckanovich
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA
- Magee-Womens Research Institute, Pittsburgh, PA, 15213, USA
- Division of Hematology/Oncology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15232, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA.
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA.
- Department of Bioengineering, University of Pittsburgh School of Engineering, Pittsburgh, PA, 15219, USA.
- Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, PA, 15261, USA.
| |
Collapse
|
13
|
Pyun J, Park YH, Wang J, Bennett DA, Bice PJ, Kim JP, Kim S, Saykin AJ, Nho K. Transcriptional risk scores in Alzheimer's disease: From pathology to cognition. Alzheimers Dement 2024; 20:243-252. [PMID: 37563770 PMCID: PMC10840812 DOI: 10.1002/alz.13406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/28/2023] [Accepted: 07/03/2023] [Indexed: 08/12/2023]
Abstract
INTRODUCTION Our previously developed blood-based transcriptional risk scores (TRS) showed associations with diagnosis and neuroimaging biomarkers for Alzheimer's disease (AD). Here, we developed brain-based TRS. METHODS We integrated AD genome-wide association study summary and expression quantitative trait locus data to prioritize target genes using Mendelian randomization. We calculated TRS using brain transcriptome data of two independent cohorts (N = 878) and performed association analysis of TRS with diagnosis, amyloidopathy, tauopathy, and cognition. We compared AD classification performance of TRS with polygenic risk scores (PRS). RESULTS Higher TRS values were significantly associated with AD, amyloidopathy, tauopathy, worse cognition, and faster cognitive decline, which were replicated in an independent cohort. The AD classification performance of PRS was increased with the inclusion of TRS up to 16% with the area under the curve value of 0.850. DISCUSSION Our results suggest brain-based TRS improves the AD classification of PRS and may be a potential AD biomarker. HIGHLIGHTS Transcriptional risk score (TRS) is developed using brain RNA-Seq data. Higher TRS values are shown in Alzheimer's disease (AD). TRS improves the AD classification power of PRS up to 16%. TRS is associated with AD pathology presence. TRS is associated with worse cognitive performance and faster cognitive decline.
Collapse
Affiliation(s)
- Jung‐Min Pyun
- Department of NeurologySoonchunhyang University Seoul HospitalSoonchunhyang University College of MedicineYongsan‐guSeoulRepublic of Korea
| | - Young Ho Park
- Department of NeurologySeoul National University College of MedicineSeoul National University Bundang HospitalSeongnam‐siGyeonggi‐doRepublic of Korea
| | - Jiebiao Wang
- Department of BiostatisticsUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - David A. Bennett
- Department of Neurological ScienceRush Alzheimer's Disease CenterRush University Medical CenterChicagoIllinoisUSA
| | - Paula J. Bice
- Department of Radiology and Imaging Sciencesand the Indiana Alzheimer Disease CenterIndiana University School of MedicineIndianapolisIndianaUSA
| | - Jun Pyo Kim
- Department of Radiology and Imaging Sciencesand the Indiana Alzheimer Disease CenterIndiana University School of MedicineIndianapolisIndianaUSA
| | - SangYun Kim
- Department of NeurologySeoul National University College of MedicineSeoul National University Bundang HospitalSeongnam‐siGyeonggi‐doRepublic of Korea
| | - Andrew J. Saykin
- Department of Radiology and Imaging Sciencesand the Indiana Alzheimer Disease CenterIndiana University School of MedicineIndianapolisIndianaUSA
- Department of Medical and Molecular GeneticsIndiana University School of MedicineIndianapolisIndianaUSA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciencesand the Indiana Alzheimer Disease CenterIndiana University School of MedicineIndianapolisIndianaUSA
- Center for Computational Biology and BioinformaticsIndiana University School of Medicine, Health Information and Translational Science BuildingIndianapolisIndianaUSA
| |
Collapse
|
14
|
Chandrashekar PB, Alatkar S, Wang J, Hoffman GE, He C, Jin T, Khullar S, Bendl J, Fullard JF, Roussos P, Wang D. DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction. Genome Med 2023; 15:88. [PMID: 37904203 PMCID: PMC10617196 DOI: 10.1186/s13073-023-01248-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
Collapse
Affiliation(s)
- Pramod Bharadwaj Chandrashekar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Sayali Alatkar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Chenfeng He
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Ting Jin
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, 10962, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA.
| |
Collapse
|
15
|
Chen W, Li C, Chen Y, Bin J, Chen Y. Cardiac cellular diversity and functionality in cardiac repair by single-cell transcriptomics. Front Cardiovasc Med 2023; 10:1237208. [PMID: 37920179 PMCID: PMC10619858 DOI: 10.3389/fcvm.2023.1237208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/02/2023] [Indexed: 11/04/2023] Open
Abstract
Cardiac repair after myocardial infarction (MI) is orchestrated by multiple intrinsic mechanisms in the heart. Identifying cardiac cell heterogeneity and its effect on processes that mediate the ischemic myocardium repair may be key to developing novel therapeutics for preventing heart failure. With the rapid advancement of single-cell transcriptomics, recent studies have uncovered novel cardiac cell populations, dynamics of cell type composition, and molecular signatures of MI-associated cells at the single-cell level. In this review, we summarized the main findings during cardiac repair by applying single-cell transcriptomics, including endogenous myocardial regeneration, myocardial fibrosis, angiogenesis, and the immune microenvironment. Finally, we also discussed the integrative analysis of spatial multi-omics transcriptomics and single-cell transcriptomics. This review provided a basis for future studies to further advance the mechanism and development of therapeutic approaches for cardiac repair.
Collapse
Affiliation(s)
- Wei Chen
- Department of Cardiology, State Key Laboratory of Organ Failure Research, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou, China
| | - Chuling Li
- Department of Cardiology, State Key Laboratory of Organ Failure Research, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou, China
| | - Yijin Chen
- Department of Cardiology, State Key Laboratory of Organ Failure Research, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou, China
| | - Jianping Bin
- Department of Cardiology, State Key Laboratory of Organ Failure Research, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou, China
| | - Yanmei Chen
- Department of Cardiology, State Key Laboratory of Organ Failure Research, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou, China
- Department of Cardiology, Ganzhou People’s Hospital, Ganzhou, China
| |
Collapse
|
16
|
Wang W, Zhou X, Wang J, Yao J, Wen H, Wang Y, Sun M, Zhang C, Tao W, Zou J, Ni T. Approximate estimation of cell-type resolution transcriptome in bulk tissue through matrix completion. Brief Bioinform 2023; 24:bbad273. [PMID: 37529921 DOI: 10.1093/bib/bbad273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/20/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for uncovering cellular heterogeneity. However, the high costs associated with this technique have rendered it impractical for studying large patient cohorts. We introduce ENIGMA (Deconvolution based on Regularized Matrix Completion), a method that addresses this limitation through accurately deconvoluting bulk tissue RNA-seq data into a readout with cell-type resolution by leveraging information from scRNA-seq data. By employing a matrix completion strategy, ENIGMA minimizes the distance between the mixture transcriptome obtained with bulk sequencing and a weighted combination of cell-type-specific expression. This allows the quantification of cell-type proportions and reconstruction of cell-type-specific transcriptomes. To validate its performance, ENIGMA was tested on both simulated and real datasets, including disease-related tissues, demonstrating its ability in uncovering novel biological insights.
Collapse
Affiliation(s)
- Weixu Wang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Xiaolan Zhou
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Jing Wang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Jun Yao
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Haimei Wen
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Yi Wang
- Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, Human Phenome Institute, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Mingwan Sun
- Key Laboratory of Gene Engineering of the Ministry of Education and State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| | - Chao Zhang
- MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Wei Tao
- MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Jiahua Zou
- Guangdong Provincial Key Laboratory of Bioengineering Medicine, National Engineering Research Center of Genetic Medicine, Institute of Biomedicine, College of Life Science and Technology, Jinan University, Guangzhou 510632, P.R. China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010070, P.R. China
| |
Collapse
|
17
|
Swapna LS, Huang M, Li Y. GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes. Genome Biol 2023; 24:190. [PMID: 37596691 PMCID: PMC10436670 DOI: 10.1186/s13059-023-03034-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 08/09/2023] [Indexed: 08/20/2023] Open
Abstract
Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.
Collapse
Affiliation(s)
| | - Michael Huang
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
18
|
Su C, Xu Z, Shan X, Cai B, Zhao H, Zhang J. Cell-type-specific co-expression inference from single cell RNA-sequencing data. Nat Commun 2023; 14:4846. [PMID: 37563115 PMCID: PMC10415381 DOI: 10.1038/s41467-023-40503-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 07/28/2023] [Indexed: 08/12/2023] Open
Abstract
The advancement of single cell RNA-sequencing (scRNA-seq) technology has enabled the direct inference of co-expressions in specific cell types, facilitating our understanding of cell-type-specific biological functions. For this task, the high sequencing depth variations and measurement errors in scRNA-seq data present two significant challenges, and they have not been adequately addressed by existing methods. We propose a statistical approach, CS-CORE, for estimating and testing cell-type-specific co-expressions, that explicitly models sequencing depth variations and measurement errors in scRNA-seq data. Systematic evaluations show that most existing methods suffered from inflated false positives as well as biased co-expression estimates and clustering analysis, whereas CS-CORE gave accurate estimates in these experiments. When applied to scRNA-seq data from postmortem brain samples from Alzheimer's disease patients/controls and blood samples from COVID-19 patients/controls, CS-CORE identified cell-type-specific co-expressions and differential co-expressions that were more reproducible and/or more enriched for relevant biological pathways than those inferred from existing methods.
Collapse
Affiliation(s)
- Chang Su
- Department of Biostatistics, Yale University, New Haven, CT, USA
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Zichun Xu
- Department of Biostatistics, Yale University, New Haven, CT, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Xinning Shan
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Biao Cai
- Department of Biostatistics, Yale University, New Haven, CT, USA
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA.
| | - Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA, USA.
| |
Collapse
|
19
|
Zhang L, Cascio S, Mellors JW, Buckanovich RJ, Osmanbeyoglu HU. Single-cell analysis reveals the stromal dynamics and tumor-specific characteristics in the microenvironment of ovarian cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.07.544095. [PMID: 37333262 PMCID: PMC10274812 DOI: 10.1101/2023.06.07.544095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
High-grade serous ovarian carcinoma (HGSOC) is a heterogeneous disease, and a high stromal/desmoplastic tumor microenvironment (TME) is associated with a poor outcome. Stromal cell subtypes, including fibroblasts, myofibroblasts, and cancer-associated mesenchymal stem cells, establish a complex network of paracrine signaling pathways with tumor-infiltrating immune cells that drive effector cell tumor immune exclusion and inhibit the antitumor immune response. Single-cell transcriptomics of the HGSOC TME from public and in-house datasets revealed a distinct transcriptomic landscape for immune and non-immune cells in high-stromal vs. low-stromal tumors. High-stromal tumors had a lower fraction of certain T cells, natural killer (NK) cells, and macrophages and increased expression of CXCL12 in epithelial cancer cells and cancer-associated mesenchymal stem cells (CA-MSCs). Analysis of cell-cell communication indicated that epithelial cancer cells and CA-MSCs secreted CXCL12 that interacted with the CXCR4 receptor, which was overexpressed on NK and CD8 + T cells. CXCL12 and/or CXCR4 antibodies confirmed the immunosuppressive role of CXCL12-CXCR4 in high-stromal tumors.
Collapse
|
20
|
Van de Sande B, Lee JS, Mutasa-Gottgens E, Naughton B, Bacon W, Manning J, Wang Y, Pollard J, Mendez M, Hill J, Kumar N, Cao X, Chen X, Khaladkar M, Wen J, Leach A, Ferran E. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov 2023; 22:496-520. [PMID: 37117846 PMCID: PMC10141847 DOI: 10.1038/s41573-023-00688-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2023] [Indexed: 04/30/2023]
Abstract
Single-cell technologies, particularly single-cell RNA sequencing (scRNA-seq) methods, together with associated computational tools and the growing availability of public data resources, are transforming drug discovery and development. New opportunities are emerging in target identification owing to improved disease understanding through cell subtyping, and highly multiplexed functional genomics screens incorporating scRNA-seq are enhancing target credentialling and prioritization. ScRNA-seq is also aiding the selection of relevant preclinical disease models and providing new insights into drug mechanisms of action. In clinical development, scRNA-seq can inform decision-making via improved biomarker identification for patient stratification and more precise monitoring of drug response and disease progression. Here, we illustrate how scRNA-seq methods are being applied in key steps in drug discovery and development, and discuss ongoing challenges for their implementation in the pharmaceutical industry.
Collapse
Affiliation(s)
| | | | | | - Bart Naughton
- Computational Neurobiology, Eisai, Cambridge, MA, USA
| | - Wendi Bacon
- EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
- The Open University, Milton Keynes, UK
| | | | - Yong Wang
- Precision Bioinformatics, Prometheus Biosciences, San Diego, CA, USA
| | | | - Melissa Mendez
- Genomic Sciences, GlaxoSmithKline, Collegeville, PA, USA
| | - Jon Hill
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, USA
| | - Namit Kumar
- Informatics & Predictive Sciences, Bristol Myers Squibb, San Diego, CA, USA
| | - Xiaohong Cao
- Genomic Research Center, AbbVie Inc., Cambridge, MA, USA
| | - Xiao Chen
- Magnet Biomedicine, Cambridge, MA, USA
| | - Mugdha Khaladkar
- Human Genetics and Computational Biology, GlaxoSmithKline, Collegeville, PA, USA
| | - Ji Wen
- Oncology Research and Development Unit, Pfizer, La Jolla, CA, USA
| | | | | |
Collapse
|
21
|
Boltz T, Schwarz T, Bot M, Hou K, Caggiano C, Lapinska S, Duan C, Boks MP, Kahn RS, Zaitlen N, Pasaniuc B, Ophoff R. Cell type deconvolution of bulk blood RNA-Seq to reveal biological insights of neuropsychiatric disorders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542156. [PMID: 37293101 PMCID: PMC10245943 DOI: 10.1101/2023.05.24.542156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Genome-wide association studies (GWAS) have uncovered susceptibility loci associated with psychiatric disorders like bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome with unknown causal mechanisms of the link between genetic variation and disease risk. Expression quantitative trait loci (eQTL) analysis of bulk tissue is a common approach to decipher underlying mechanisms, though this can obscure cell-type specific signals thus masking trait-relevant mechanisms. While single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell type proportions and cell type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-Seq from 1,730 samples derived from whole blood in a cohort ascertained for individuals with BP and SCZ this study estimated cell type proportions and their relation with disease status and medication. We found between 2,875 and 4,629 eGenes for each cell type, including 1,211 eGenes that are not found using bulk expression alone. We performed a colocalization test between cell type eQTLs and various traits and identified hundreds of associations between cell type eQTLs and GWAS loci that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on cell type expression regulation and found examples of genes that are differentially regulated dependent on lithium use. Our study suggests that computational methods can be applied to large bulk RNA-Seq datasets of non-brain tissue to identify disease-relevant, cell type specific biology of psychiatric disorders and psychiatric medication.
Collapse
Affiliation(s)
- Toni Boltz
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Merel Bot
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Christa Caggiano
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Chenda Duan
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Marco P Boks
- Department of Psychiatry, Brain Center University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands
| | - Rene S Kahn
- Department of Psychiatry, Brain Center University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands
- Department of Psychiatry, Icahn School of Medicine, Mount Sinai, NY, USA
| | - Noah Zaitlen
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Roel Ophoff
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
- Department of Psychiatry, Erasmus University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
22
|
Pyun JM, Park YH, Wang J, Bice PJ, Bennett DA, Kim S, Saykin AJ, Nho K. Aberrant GAP43 Gene Expression Is Alzheimer Disease Pathology-Specific. Ann Neurol 2023; 93:1047-1048. [PMID: 36897291 PMCID: PMC10464844 DOI: 10.1002/ana.26637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 12/15/2022] [Indexed: 03/11/2023]
Affiliation(s)
- Jung-Min Pyun
- Department of Neurology, Soonchunhyang University Seoul Hospital, Soonchunhyang University College of Medicine, Seoul, Republic of Korea
| | - Young Ho Park
- Department of Neurology, Seoul National University Bundang Hospital and Seoul National University College of Medicine, Seongnam, Republic of Korea
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Paula J Bice
- Department of Radiology and Imaging Sciences, and Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - SangYun Kim
- Department of Neurology, Seoul National University Bundang Hospital and Seoul National University College of Medicine, Seongnam, Republic of Korea
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, and Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, Indiana, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, and Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, Indiana, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| |
Collapse
|
23
|
Hicks EM, Seah C, Cote A, Marchese S, Brennand KJ, Nestler EJ, Girgenti MJ, Huckins LM. Integrating genetics and transcriptomics to study major depressive disorder: a conceptual framework, bioinformatic approaches, and recent findings. Transl Psychiatry 2023; 13:129. [PMID: 37076454 PMCID: PMC10115809 DOI: 10.1038/s41398-023-02412-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 04/21/2023] Open
Abstract
Major depressive disorder (MDD) is a complex and heterogeneous psychiatric syndrome with genetic and environmental influences. In addition to neuroanatomical and circuit-level disturbances, dysregulation of the brain transcriptome is a key phenotypic signature of MDD. Postmortem brain gene expression data are uniquely valuable resources for identifying this signature and key genomic drivers in human depression; however, the scarcity of brain tissue limits our capacity to observe the dynamic transcriptional landscape of MDD. It is therefore crucial to explore and integrate depression and stress transcriptomic data from numerous, complementary perspectives to construct a richer understanding of the pathophysiology of depression. In this review, we discuss multiple approaches for exploring the brain transcriptome reflecting dynamic stages of MDD: predisposition, onset, and illness. We next highlight bioinformatic approaches for hypothesis-free, genome-wide analyses of genomic and transcriptomic data and their integration. Last, we summarize the findings of recent genetic and transcriptomic studies within this conceptual framework.
Collapse
Affiliation(s)
- Emily M Hicks
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Carina Seah
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Alanna Cote
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Shelby Marchese
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Kristen J Brennand
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06511, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA
| | - Eric J Nestler
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| | - Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA.
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| |
Collapse
|
24
|
Huang P, Cai M, Lu X, McKennan C, Wang J. Accurate estimation of rare cell type fractions from tissue omics data via hierarchical deconvolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.15.532820. [PMID: 36993280 PMCID: PMC10055056 DOI: 10.1101/2023.03.15.532820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose Hierarchical Deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon significantly outperforms existing methods and accurately estimates cellular fractions.
Collapse
Affiliation(s)
- Penghui Huang
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Manqi Cai
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chris McKennan
- Deparment of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jiebiao Wang
- Deparment of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
25
|
Dai R, Chu T, Zhang M, Wang X, Jourdon A, Wu F, Mariani J, Vaccarino FM, Lee D, Fullard JF, Hoffman GE, Roussos P, Wang Y, Wang X, Pinto D, Wang SH, Zhang C, Chen C, Liu C. Evaluating performance and applications of sample-wise cell deconvolution methods on human brain transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532468. [PMID: 36993743 PMCID: PMC10054947 DOI: 10.1101/2023.03.13.532468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Sample-wise deconvolution methods have been developed to estimate cell-type proportions and gene expressions in bulk-tissue samples. However, the performance of these methods and their biological applications has not been evaluated, particularly on human brain transcriptomic data. Here, nine deconvolution methods were evaluated with sample-matched data from bulk-tissue RNAseq, single-cell/nuclei (sc/sn) RNAseq, and immunohistochemistry. A total of 1,130,767 nuclei/cells from 149 adult postmortem brains and 72 organoid samples were used. The results showed the best performance of dtangle for estimating cell proportions and bMIND for estimating sample-wise cell-type gene expression. For eight brain cell types, 25,273 cell-type eQTLs were identified with deconvoluted expressions (decon-eQTLs). The results showed that decon-eQTLs explained more schizophrenia GWAS heritability than bulk-tissue or single-cell eQTLs alone. Differential gene expression associated with multiple phenotypes were also examined using the deconvoluted data. Our findings, which were replicated in bulk-tissue RNAseq and sc/snRNAseq data, provided new insights into the biological applications of deconvoluted data.
Collapse
Affiliation(s)
- Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Tianyao Chu
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Ming Zhang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Xuan Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | | | - Feinan Wu
- Child Study Center, Yale University, New Haven, CT, USA
| | | | - Flora M Vaccarino
- Child Study Center, Yale University, New Haven, CT, USA
- Department of Neuroscience, Yale University, New Haven, CT, USA
| | - Donghoon Lee
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, VA, USA
| | - Xusheng Wang
- Department of Biology, University of North Dakota, Grand Forks, ND, USA
| | - Dalila Pinto
- Department of Psychiatry, Department of Genetics and Genomic Sciences, Mindich Child Health and Development Institute, and Icahn Genomics Institute for Data Science and Genomic Technology, Seaver Autism Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sidney H Wang
- Center for Human Genetics, The Brown foundation Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Chunling Zhang
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Chao Chen
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- Department of Neuroscience & Physiology, SUNY Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
26
|
Chen L, Li Z, Wu H. CeDAR: incorporating cell type hierarchy improves cell type-specific differential analyses in bulk omics data. Genome Biol 2023; 24:37. [PMID: 36855165 PMCID: PMC9972684 DOI: 10.1186/s13059-023-02857-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 01/17/2023] [Indexed: 03/02/2023] Open
Abstract
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate cell type-specific inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we develop a novel statistical method named CeDAR to incorporate the cell type hierarchy in cell type-specific differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting cell type-specific differential signals compared with existing methods, especially in low-abundance cell types.
Collapse
Affiliation(s)
- Luxiao Chen
- Department of Biostatistics and Bioinformatics, Emory University, GA 30322 Atlanta, USA
| | - Ziyi Li
- Department of Biostatistics, The University of MD Anderson Cancer Center, 77030 Houston, TX, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055 P.R. China
| |
Collapse
|
27
|
Song X, Ji J, Rothstein JH, Alexeeff SE, Sakoda LC, Sistig A, Achacoso N, Jorgenson E, Whittemore AS, Klein RJ, Habel LA, Wang P, Sieh W. MiXcan: a framework for cell-type-aware transcriptome-wide association studies with an application to breast cancer. Nat Commun 2023; 14:377. [PMID: 36690614 PMCID: PMC9871010 DOI: 10.1038/s41467-023-35888-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023] Open
Abstract
Human bulk tissue samples comprise multiple cell types with diverse roles in disease etiology. Conventional transcriptome-wide association study approaches predict genetically regulated gene expression at the tissue level, without considering cell-type heterogeneity, and test associations of predicted tissue-level expression with disease. Here we develop MiXcan, a cell-type-aware transcriptome-wide association study approach that predicts cell-type-level expression, identifies disease-associated genes via combination of cell-type-level association signals for multiple cell types, and provides insight into the disease-critical cell type. As a proof of concept, we conducted cell-type-aware analyses of breast cancer in 58,648 women and identified 12 transcriptome-wide significant genes using MiXcan compared with only eight genes using conventional approaches. Importantly, MiXcan identified genes with distinct associations in mammary epithelial versus stromal cells, including three new breast cancer susceptibility genes. These findings demonstrate that cell-type-aware transcriptome-wide analyses can reveal new insights into the genetic and cellular etiology of breast cancer and other diseases.
Collapse
Affiliation(s)
- Xiaoyu Song
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Jiayi Ji
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph H Rothstein
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stacey E Alexeeff
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Lori C Sakoda
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Adriana Sistig
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ninah Achacoso
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Eric Jorgenson
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Alice S Whittemore
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Robert J Klein
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laurel A Habel
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Pei Wang
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Weiva Sieh
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
28
|
Su C, Xu Z, Shan X, Cai B, Zhao H, Zhang J. Cell-type-specific co-expression inference from single cell RNA-sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.12.13.520181. [PMID: 36561173 PMCID: PMC9774209 DOI: 10.1101/2022.12.13.520181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The inference of gene co-expressions from microarray and RNA-sequencing data has led to rich insights on biological processes and disease mechanisms. However, the bulk samples analyzed in most studies are a mixture of different cell types. As a result, the inferred co-expressions are confounded by varying cell type compositions across samples and only offer an aggregated view of gene regulations that may be distinct across different cell types. The advancement of single cell RNA-sequencing (scRNA-seq) technology has enabled the direct inference of co-expressions in specific cell types, facilitating our understanding of cell-type-specific biological functions. However, the high sequencing depth variations and measurement errors in scRNA-seq data present significant challenges in inferring cell-type-specific gene co-expressions, and these issues have not been adequately addressed in the existing methods. We propose a statistical approach, CS-CORE, for estimating and testing cell-type-specific co-expressions, built on a general expression-measurement model that explicitly accounts for sequencing depth variations and measurement errors in the observed single cell data. Systematic evaluations show that most existing methods suffer from inflated false positives and biased co-expression estimates and clustering analysis, whereas CS-CORE has appropriate false positive control, unbiased co-expression estimates, good statistical power and satisfactory performance in downstream co-expression analysis. When applied to analyze scRNA-seq data from postmortem brain samples from Alzheimer’s disease patients and controls and blood samples from COVID-19 patients and controls, CS-CORE identified cell-type-specific co-expressions and differential co-expressions that were more reproducible and/or more enriched for relevant biological pathways than those inferred from other methods.
Collapse
Affiliation(s)
- Chang Su
- Department of Biostatistics, Yale University
| | - Zichun Xu
- Department of Biostatistics, Yale University
| | | | - Biao Cai
- Department of Biostatistics, Yale University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| | - Jingfei Zhang
- Information Systems and Operations Management, Emory University
| |
Collapse
|
29
|
Su C, Xu Z, Shan X, Cai B, Zhao H, Zhang J. Cell-type-specific co-expression inference from single cell RNA-sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.12.13.520181. [PMID: 36561173 DOI: 10.1101/2022.04.07.487499] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The inference of gene co-expressions from microarray and RNA-sequencing data has led to rich insights on biological processes and disease mechanisms. However, the bulk samples analyzed in most studies are a mixture of different cell types. As a result, the inferred co-expressions are confounded by varying cell type compositions across samples and only offer an aggregated view of gene regulations that may be distinct across different cell types. The advancement of single cell RNA-sequencing (scRNA-seq) technology has enabled the direct inference of co-expressions in specific cell types, facilitating our understanding of cell-type-specific biological functions. However, the high sequencing depth variations and measurement errors in scRNA-seq data present significant challenges in inferring cell-type-specific gene co-expressions, and these issues have not been adequately addressed in the existing methods. We propose a statistical approach, CS-CORE, for estimating and testing cell-type-specific co-expressions, built on a general expression-measurement model that explicitly accounts for sequencing depth variations and measurement errors in the observed single cell data. Systematic evaluations show that most existing methods suffer from inflated false positives and biased co-expression estimates and clustering analysis, whereas CS-CORE has appropriate false positive control, unbiased co-expression estimates, good statistical power and satisfactory performance in downstream co-expression analysis. When applied to analyze scRNA-seq data from postmortem brain samples from Alzheimer’s disease patients and controls and blood samples from COVID-19 patients and controls, CS-CORE identified cell-type-specific co-expressions and differential co-expressions that were more reproducible and/or more enriched for relevant biological pathways than those inferred from other methods.
Collapse
|
30
|
Yan L, Sun X. Benchmarking and integration of methods for deconvoluting spatial transcriptomic data. Bioinformatics 2022; 39:6900924. [PMID: 36515467 PMCID: PMC9825747 DOI: 10.1093/bioinformatics/btac805] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/11/2022] [Accepted: 12/13/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The rapid development of spatial transcriptomics (ST) approaches has provided new insights into understanding tissue architecture and function. However, the gene expressions measured at a spot may contain contributions from multiple cells due to the low-resolution of current ST technologies. Although many computational methods have been developed to disentangle discrete cell types from spatial mixtures, the community lacks a thorough evaluation of the performance of those deconvolution methods. RESULTS Here, we present a comprehensive benchmarking of 14 deconvolution methods on four datasets. Furthermore, we investigate the robustness of different methods to sequencing depth, spot size and the choice of normalization. Moreover, we propose a new ensemble learning-based deconvolution method (EnDecon) by integrating multiple individual methods for more accurate deconvolution. The major new findings include: (i) cell2loction, RCTD and spatialDWLS are more accurate than other ST deconvolution methods, based on the evaluation of three metrics: RMSE, PCC and JSD; (ii) cell2location and spatialDWLS are more robust to the variation of sequencing depth than RCTD; (iii) the accuracy of the existing methods tends to decrease as the spot size becomes smaller; (iv) most deconvolution methods perform best when they normalize ST data using the method described in their original papers; and (v) the integrative method, EnDecon, could achieve more accurate ST deconvolution. Our study provides valuable information and guideline for practically applying ST deconvolution tools and developing new and more effective methods. AVAILABILITY AND IMPLEMENTATION The benchmarking pipeline is available at https://github.com/SunXQlab/ST-deconvoulution. An R package for EnDecon is available at https://github.com/SunXQlab/EnDecon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Yan
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | | |
Collapse
|
31
|
Costa-Silva J, Domingues DS, Menotti D, Hungria M, Lopes FM. Temporal progress of gene expression analysis with RNA-Seq data: A review on the relationship between computational methods. Comput Struct Biotechnol J 2022; 21:86-98. [PMID: 36514333 PMCID: PMC9730150 DOI: 10.1016/j.csbj.2022.11.051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
Analysis of differential gene expression from RNA-seq data has become a standard for several research areas. The steps for the computational analysis include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of the differential expression analysis pipeline, addressing its steps and the respective objectives, the principal methods available in each step, and their properties, therefore introducing an organized overview to this context. This review aims to address mainly the aspects involved in the differentially expressed gene (DEG) analysis from RNA sequencing data (RNA-seq), considering the computational methods. In addition, a timeline of the computational methods for DEG is shown and discussed, and the relationships existing between the most important computational tools are presented by an interaction network. A discussion on the challenges and gaps in DEG analysis is also highlighted in this review. This paper will serve as a tutorial for new entrants into the field and help established users update their analysis pipelines.
Collapse
Affiliation(s)
- Juliana Costa-Silva
- Department of Informatics – Federal University of Paraná, Rua Coronel Francisco Heráclito dos Santos, 100, 81531-990 Curitiba, Paraná, Brazil
| | - Douglas S. Domingues
- Department of Genetics, “Luiz de Queiroz” College of Agriculture, University of São Paulo, Av. Pádua Dias, 11, 13418-900 Piracicaba, São Paulo, Brazil
| | - David Menotti
- Department of Informatics – Federal University of Paraná, Rua Coronel Francisco Heráclito dos Santos, 100, 81531-990 Curitiba, Paraná, Brazil
| | - Mariangela Hungria
- Department of Soil Biotecnology - Embrapa Soybean, Cx. Postal 231, 86000-970 Londrina, Paraná, Brazil
| | - Fabrício Martins Lopes
- Department of Computer Science, Universidade Tecnológica Federal do Paraná – UTFPR, Av. Alberto Carazzai, 1640, 86300-000, Cornélio Procópio, Paraná, Brazil
| |
Collapse
|
32
|
Cuddleston WH, Fan X, Sloofman L, Liang L, Mossotto E, Moore K, Zipkowitz S, Wang M, Zhang B, Wang J, Sestan N, Devlin B, Roeder K, Sanders SJ, Buxbaum JD, Breen MS. Spatiotemporal and genetic regulation of A-to-I editing throughout human brain development. Cell Rep 2022; 41:111585. [PMID: 36323256 PMCID: PMC9704047 DOI: 10.1016/j.celrep.2022.111585] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 07/06/2022] [Accepted: 10/07/2022] [Indexed: 11/06/2022] Open
Abstract
Posttranscriptional RNA modifications by adenosine-to-inosine (A-to-I) editing are abundant in the brain, yet elucidating functional sites remains challenging. To bridge this gap, we investigate spatiotemporal and genetically regulated A-to-I editing sites across prenatal and postnatal stages of human brain development. More than 10,000 spatiotemporally regulated A-to-I sites were identified that occur predominately in 3' UTRs and introns, as well as 37 sites that recode amino acids in protein coding regions with precise changes in editing levels across development. Hyper-edited transcripts are also enriched in the aging brain and stabilize RNA secondary structures. These features are conserved in murine and non-human primate models of neurodevelopment. Finally, thousands of cis-editing quantitative trait loci (edQTLs) were identified with unique regulatory effects during prenatal and postnatal development. Collectively, this work offers a resolved atlas linking spatiotemporal variation in editing levels to genetic regulatory effects throughout distinct stages of brain maturation.
Collapse
Affiliation(s)
- Winston H Cuddleston
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Xuanjia Fan
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Laura Sloofman
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lindsay Liang
- Department of Psychiatry and Behavioral Sciences and UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Enrico Mossotto
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kendall Moore
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sarah Zipkowitz
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Icahn Institute for Genomics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA; Icahn Institute for Genomics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, USA
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA; Program in Cellular Neuroscience, Neurodegeneration, and Repair and Yale Child Study Center, Yale School of Medicine, New Haven, CT 06510, USA; Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06520, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Department of Comparative Medicine, Program in Integrative Cell Signaling and Neurobiology of Metabolism, Yale School of Medicine, New Haven, CT 06510, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, 3811 O'Hara Street, Pittsburgh, PA 15213, USA
| | - Kathryn Roeder
- Carnegie Mellon University, Statistics & Data Science Department, Pittsburgh, PA 15213, USA
| | - Stephan J Sanders
- Department of Psychiatry and Behavioral Sciences and UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Joseph D Buxbaum
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael S Breen
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
33
|
De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution. Nat Commun 2022; 13:6498. [PMID: 36310179 PMCID: PMC9618574 DOI: 10.1038/s41467-022-34271-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 10/19/2022] [Indexed: 12/25/2022] Open
Abstract
Uncovering the tissue molecular architecture at single-cell resolution could help better understand organisms' biological and pathological processes. However, bulk RNA-seq can only measure gene expression in cell mixtures, without revealing the transcriptional heterogeneity and spatial patterns of single cells. Herein, we introduce Bulk2Space ( https://github.com/ZJUFanLab/bulk2space ), a deep learning framework-based spatial deconvolution algorithm that can simultaneously disclose the spatial and cellular heterogeneity of bulk RNA-seq data using existing single-cell and spatial transcriptomics references. The use of bulk transcriptomics to validate Bulk2Space unveils, in particular, the spatial variance of immune cells in different tumor regions, the molecular and spatial heterogeneity of tissues during inflammation-induced tumorigenesis, and spatial patterns of novel genes in different cell types. Moreover, Bulk2Space is utilized to perform spatial deconvolution analysis on bulk transcriptome data from two different mouse brain regions derived from our in-house developed sequencing approach termed Spatial-seq. We have not only reconstructed the hierarchical structure of the mouse isocortex but also further annotated cell types that were not identified by original methods in the mouse hypothalamus.
Collapse
|
34
|
Cai M, Yue M, Chen T, Liu J, Forno E, Lu X, Billiar T, Celedón J, McKennan C, Chen W, Wang J. Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics 2022; 38:3004-3010. [PMID: 35438146 PMCID: PMC9991889 DOI: 10.1093/bioinformatics/btac279] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/22/2022] [Accepted: 04/13/2022] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. RESULTS To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. AVAILABILITY AND IMPLEMENTATION EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. The RNA microarray data from the TRAUMA study are available and can be accessed in GEO (GSE36809). The demographic and clinical phenotypes can be shared on reasonable request to the corresponding authors. The RNA-seq data from the EVAPR study cannot be shared publicly due to the privacy of individuals that participated in the clinical research in compliance with the IRB approval at the University of Pittsburgh. The RNA microarray data from the FHS study are available from dbGaP (phs000007.v32.p13). The RNA-seq data from ROS study is downloaded from AD Knowledge Portal. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manqi Cai
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Molin Yue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Tianmeng Chen
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Jinling Liu
- Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology, Rolla, MO 65409, USA
- Department of Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65409, USA
| | - Erick Forno
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| | - Timothy Billiar
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Juan Celedón
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Chris McKennan
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wei Chen
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
35
|
Chen L, Wu CT, Lin CH, Dai R, Liu C, Clarke R, Yu G, Van Eyk JE, Herrington DM, Wang Y. swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution. Bioinformatics 2022; 38:1403-1410. [PMID: 34904628 PMCID: PMC8826012 DOI: 10.1093/bioinformatics/btab839] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/30/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell subtypes. Both subtype compositions and subtype-specific (STS) expressions can vary across biological conditions. Computational deconvolution aims to dissect patterns of bulk tissue data into subtype compositions and STS expressions. Existing deconvolution methods can only estimate averaged STS expressions in a population, while many downstream analyses such as inferring co-expression networks in particular subtypes require subtype expression estimates in individual samples. However, individual-level deconvolution is a mathematically underdetermined problem because there are more variables than observations. RESULTS We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and STS expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ2,1-norm regularized matrix factorization problem. We determine hyperparameter values using cross-validation with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. Experimental results on realistic simulation data show that swCAM can accurately estimate STS expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk data. In two real-world applications, swCAM analysis of bulk RNASeq data from brain tissue of cases and controls with bipolar disorder or Alzheimer's disease identified significant changes in cell proportion, expression pattern and co-expression module in patient neurons. Comparative evaluation of swCAM versus peer methods is also provided. AVAILABILITY AND IMPLEMENTATION The R Scripts of swCAM are freely available at https://github.com/Lululuella/swCAM. A user's guide and a vignette are provided. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chia-Hsiang Lin
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
36
|
Lei H, Guo XA, Tao Y, Ding K, Fu X, Oesterreich S, Lee AV, Schwartz R. OUP accepted manuscript. Bioinformatics 2022; 38:i386-i394. [PMID: 35758822 PMCID: PMC9235482 DOI: 10.1093/bioinformatics/btac262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Motivation Identifying cell types and their abundances and how these evolve during tumor progression is critical to understanding the mechanisms of metastasis and identifying predictors of metastatic potential that can guide the development of new diagnostics or therapeutics. Single-cell RNA sequencing (scRNA-seq) has been especially promising in resolving heterogeneity of expression programs at the single-cell level, but is not always feasible, e.g. for large cohort studies or longitudinal analysis of archived samples. In such cases, clonal subpopulations may still be inferred via genomic deconvolution, but deconvolution methods have limited ability to resolve fine clonal structure and may require reference cell type profiles that are missing or imprecise. Prior methods can eliminate the need for reference profiles but show unstable performance when few bulk samples are available. Results In this work, we develop a new method using reference scRNA-seq to interpret sample collections for which only bulk RNA-seq is available for some samples, e.g. clonally resolving archived primary tissues using scRNA-seq from metastases. By integrating such information in a Quadratic Programming framework, our method can recover more accurate cell types and corresponding cell type abundances in bulk samples. Application to a breast tumor bone metastases dataset confirms the power of scRNA-seq data to improve cell type inference and quantification in same-patient bulk samples. Availability and implementation Source code is available on Github at https://github.com/CMUSchwartzLab/RADs.
Collapse
Affiliation(s)
| | | | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kai Ding
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Xuecong Fu
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Adrian V Lee
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
37
|
Qiu Y, Wang J, Lei J, Roeder K. Identification of cell-type-specific marker genes from co-expression patterns in tissue samples. Bioinformatics 2021; 37:3228-3234. [PMID: 33904573 PMCID: PMC8504631 DOI: 10.1093/bioinformatics/btab257] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 03/15/2021] [Accepted: 04/24/2021] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Marker genes, defined as genes that are expressed primarily in a single-cell type, can be identified from the single-cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. RESULTS To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. AVAILABILITY AND IMPLEMENTATION We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yixuan Qiu
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
38
|
|