1
|
Ghaffari S, Bouchonville KJ, Saleh E, Schmidt RE, Offer SM, Sinha S. BEDwARS: a robust Bayesian approach to bulk gene expression deconvolution with noisy reference signatures. Genome Biol 2023; 24:178. [PMID: 37537644 PMCID: PMC10399072 DOI: 10.1186/s13059-023-03007-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/05/2023] [Indexed: 08/05/2023] Open
Abstract
Differential gene expression in bulk transcriptomics data can reflect change of transcript abundance within a cell type and/or change in the proportions of cell types. Expression deconvolution methods can help differentiate these scenarios. BEDwARS is a Bayesian deconvolution method designed to address differences between reference signatures of cell types and corresponding true signatures underlying bulk transcriptomic profiles. BEDwARS is more robust to noisy reference signatures and outperforms leading in-class methods for estimating cell type proportions and signatures. Application of BEDwARS to dihydropyridine dehydrogenase deficiency identified the possible involvement of ciliopathy and impaired translational control in the etiology of the disorder.
Collapse
Affiliation(s)
- Saba Ghaffari
- Department of Computer Science, University of Illinois at Urbana-Champaign, Thomas M. Siebel Center, 201 N. Goodwin Ave., Urbana, IL, USA
| | - Kelly J Bouchonville
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, Thomas M. Siebel Center, 201 N. Goodwin Ave., Urbana, IL, USA
| | - Remington E Schmidt
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA
| | - Steven M Offer
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St. SW, Rochester, MN, 55905, USA.
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory University, Georgia Institute of Technology, 3108 U.A. Whitaker Bldg., 313 Ferst Drive, Atlanta, GA, 30332, USA.
| |
Collapse
|
2
|
Deng W, Li B, Wang J, Jiang W, Yan X, Li N, Vukmirovic M, Kaminski N, Wang J, Zhao H. A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy. Brief Bioinform 2023; 24:bbac616. [PMID: 36631398 PMCID: PMC9851324 DOI: 10.1093/bib/bbac616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/14/2022] [Indexed: 01/13/2023] Open
Abstract
Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.
Collapse
Affiliation(s)
- Wenxuan Deng
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Bolun Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Jiawei Wang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Xiting Yan
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Ningshan Li
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Milica Vukmirovic
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jing Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| |
Collapse
|
3
|
Park S, Lee ER, Zhao H. Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data. J Am Stat Assoc 2022; 119:202-216. [PMID: 38481466 PMCID: PMC10928550 DOI: 10.1080/01621459.2022.2105704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 07/16/2022] [Indexed: 10/16/2022]
Abstract
In this paper, we study high-dimensional multivariate logistic regression models in which a common set of covariates is used to predict multiple binary outcomes simultaneously. Our work is primarily motivated from many biomedical studies with correlated multiple responses such as the cancer cell-line encyclopedia project. We assume that the underlying regression coefficient matrix is simultaneously low-rank and row-wise sparse. We propose an intuitively appealing selection and estimation framework based on marginal model likelihood, and we develop an efficient computational algorithm for inference. We establish a novel high-dimensional theory for this nonlinear multivariate regression. Our theory is general, allowing for potential correlations between the binary responses. We propose a new type of nuclear norm penalty using the smooth clipped absolute deviation, filling the gap in the related non-convex penalization literature. We theoretically demonstrate that the proposed approach improves estimation accuracy by considering multiple responses jointly through the proposed estimator when the underlying coefficient matrix is low-rank and row-wise sparse. In particular, we establish the non-asymptotic error bounds, and both rank and row support consistency of the proposed method. Moreover, we develop a consistent rule to simultaneously select the rank and row dimension of the coefficient matrix. Furthermore, we extend the proposed methods and theory to a joint Ising model, which accounts for the dependence relationships. In our analysis of both simulated data and the cancer cell line encyclopedia data, the proposed methods outperform the existing methods in better predicting responses.
Collapse
Affiliation(s)
- Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06511, USA
| |
Collapse
|