1
|
Saki A, Faghihi U, Baldé I. Differentiating Gliosarcoma from Glioblastoma: A Novel Approach Using PEACE and XGBoost to Deal with Datasets with Ultra-High Dimensional Confounders. Life (Basel) 2024; 14:882. [PMID: 39063635 PMCID: PMC11278037 DOI: 10.3390/life14070882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/28/2024] [Accepted: 06/28/2024] [Indexed: 07/28/2024] Open
Abstract
In this study, we used a recently developed causal methodology, called Probabilistic Easy Variational Causal Effect (PEACE), to distinguish gliosarcoma (GSM) from glioblastoma (GBM). Our approach uses a causal metric which combines Probabilistic Easy Variational Causal Effect (PEACE) with the XGBoost, or eXtreme Gradient Boosting, algorithm. Unlike prior research, which often relied on statistical models to reduce dataset dimensions before causal analysis, our approach uses the complete dataset with PEACE and the XGBoost algorithm. PEACE provides a comprehensive measurement of direct causal effects, applicable to both continuous and discrete variables. Our method provides both positive and negative versions of PEACE together with their averages to calculate the positive and negative causal effects of the radiomic features on the variable representing the type of tumor (GSM or GBM). In our model, PEACE and its variations are equipped with a degree d which varies from 0 to 1 and it reflects the importance of the rarity and frequency of the events. By using PEACE with XGBoost, we achieved a detailed and nuanced understanding of the causal relationships within the dataset features, facilitating accurate differentiation between GSM and GBM. To assess the XGBoost model, we used cross-validation and obtained a mean accuracy of 83% and an average model MSE of 0.130. This performance is notable given the high number of columns and low number of rows (code on GitHub).
Collapse
Affiliation(s)
- Amir Saki
- Département de Mathématiques et d’Informatique, Université du Québec à Trois-Rivières, Trois-Rivières, QC G8Z 4M3, Canada;
| | - Usef Faghihi
- Département de Mathématiques et d’Informatique, Université du Québec à Trois-Rivières, Trois-Rivières, QC G8Z 4M3, Canada;
| | - Ismaila Baldé
- Département de Mathématiques et de Statistique, Faculté des Sciences, Université de Moncton, Moncton, NB E1A3E9, Canada;
| |
Collapse
|
2
|
Liu Y, Gao Q, Wei K, Huang C, Wang C, Yu Y, Qin G, Wang T. High-dimensional generalized median adaptive lasso with application to omics data. Brief Bioinform 2024; 25:bbae059. [PMID: 38436558 PMCID: PMC10939310 DOI: 10.1093/bib/bbae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 01/03/2024] [Indexed: 03/05/2024] Open
Abstract
Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer's disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.
Collapse
Affiliation(s)
- Yahang Liu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| | - Kecheng Wei
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Chen Huang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Ce Wang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Yongfu Yu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| |
Collapse
|
3
|
Gao Q, Zhang Y, Sun H, Wang T. Evaluation of propensity score methods for causal inference with high-dimensional covariates. Brief Bioinform 2022; 23:6603435. [PMID: 35667004 DOI: 10.1093/bib/bbac227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 05/11/2022] [Accepted: 05/17/2022] [Indexed: 11/12/2022] Open
Abstract
In recent work, researchers have paid considerable attention to the estimation of causal effects in observational studies with a large number of covariates, which makes the unconfoundedness assumption plausible. In this paper, we review propensity score (PS) methods developed in high-dimensional settings and broadly group them into model-based methods that extend models for prediction to causal inference and balance-based methods that combine covariate balancing constraints. We conducted systematic simulation experiments to evaluate these two types of methods, and studied whether the use of balancing constraints further improved estimation performance. Our comparison methods were post-double-selection (PDS), double-index PS (DiPS), outcome-adaptive LASSO (OAL), group LASSO and doubly robust estimation (GLiDeR), high-dimensional covariate balancing PS (hdCBPS), regularized calibrated estimators (RCAL) and approximate residual balancing method (balanceHD). For the four model-based methods, simulation studies showed that GLiDeR was the most stable approach, with high estimation accuracy and precision, followed by PDS, OAL and DiPS. For balance-based methods, hdCBPS performed similarly to GLiDeR in terms of accuracy, and outperformed balanceHD and RCAL. These findings imply that PS methods do not benefit appreciably from covariate balancing constraints in high-dimensional settings. In conclusion, we recommend the preferential use of GLiDeR and hdCBPS approaches for estimating causal effects in high-dimensional settings; however, further studies on the construction of valid confidence intervals are required.
Collapse
Affiliation(s)
- Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongwei Sun
- Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
4
|
Yu D, Wang L, Kong D, Zhu H. Mapping the Genetic-Imaging-Clinical Pathway with Applications to Alzheimer’s Disease. J Am Stat Assoc 2022; 117:1656-1668. [PMID: 37009529 PMCID: PMC10062702 DOI: 10.1080/01621459.2022.2087658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Alzheimer's disease is a progressive form of dementia that results in problems with memory, thinking, and behavior. It often starts with abnormal aggregation and deposition of β amyloid and tau, followed by neuronal damage such as atrophy of the hippocampi, leading to Alzheimers Disease (AD). The aim of this paper is to map the genetic-imaging-clinical pathway for AD in order to delineate the genetically-regulated brain changes that drive disease progression based on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset. We develop a novel two-step approach to delineate the association between high-dimensional 2D hippocampal surface exposures and the Alzheimers Disease Assessment Scale (ADAS) cognitive score, while taking into account the ultra-high dimensional clinical and genetic covariates at baseline. Analysis results suggest that the radial distance of each pixel of both hippocampi is negatively associated with the severity of behavioral deficits conditional on observed clinical and genetic covariates. These associations are stronger in Cornu Ammonis region 1 (CA1) and subiculum subregions compared to Cornu Ammonis region 2 (CA2) and Cornu Ammonis region 3 (CA3) subregions. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Dengdeng Yu
- Department of Mathematics, University of Texas at Arlington
| | - Linbo Wang
- Department of Statistical Sciences, University of Toronto
| | - Dehan Kong
- Department of Statistical Sciences, University of Toronto
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill for the Alzheimer’s Disease Neuroimaging Initiative*
| |
Collapse
|