1
|
Grützmann K, Kraft T, Meinhardt M, Meier F, Westphal D, Seifert M. Network-based analysis of heterogeneous patient-matched brain and extracranial melanoma metastasis pairs reveals three homogeneous subgroups. Comput Struct Biotechnol J 2024; 23:1036-1050. [PMID: 38464935 PMCID: PMC10920107 DOI: 10.1016/j.csbj.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 03/12/2024] Open
Abstract
Melanoma, the deadliest form of skin cancer, can metastasize to different organs. Molecular differences between brain and extracranial melanoma metastases are poorly understood. Here, promoter methylation and gene expression of 11 heterogeneous patient-matched pairs of brain and extracranial metastases were analyzed using melanoma-specific gene regulatory networks learned from public transcriptome and methylome data followed by network-based impact propagation of patient-specific alterations. This innovative data analysis strategy allowed to predict potential impacts of patient-specific driver candidate genes on other genes and pathways. The patient-matched metastasis pairs clustered into three robust subgroups with specific downstream targets with known roles in cancer, including melanoma (SG1: RBM38, BCL11B, SG2: GATA3, FES, SG3: SLAMF6, PYCARD). Patient subgroups and ranking of target gene candidates were confirmed in a validation cohort. Summarizing, computational network-based impact analyses of heterogeneous metastasis pairs predicted individual regulatory differences in melanoma brain metastases, cumulating into three consistent subgroups with specific downstream target genes.
Collapse
Affiliation(s)
- Konrad Grützmann
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Theresa Kraft
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Matthias Meinhardt
- Department of Pathology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
| | - Friedegund Meier
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Dana Westphal
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Michael Seifert
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| |
Collapse
|
2
|
Waldorp L, Haslbeck J. Network Inference With the Lasso. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:738-757. [PMID: 38587864 DOI: 10.1080/00273171.2024.2317928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Calculating confidence intervals and p-values of edges in networks is useful to decide their presence or absence and it is a natural way to quantify uncertainty. Since lasso estimation is often used to obtain edges in a network, and the underlying distribution of lasso estimates is discontinuous and has probability one at zero when the estimate is zero, obtaining p-values and confidence intervals is problematic. It is also not always desirable to use the lasso to select the edges because there are assumptions required for correct identification of network edges that may not be warranted for the data at hand. Here, we review three methods that either use a modified lasso estimate (desparsified or debiased lasso) or a method that uses the lasso for selection and then determines p-values without the lasso. We compare these three methods with popular methods to estimate Gaussian Graphical Models in simulations and conclude that the desparsified lasso and its bootstrapped version appear to be the best choices for selection and quantifying uncertainty with confidence intervals and p-values.
Collapse
Affiliation(s)
- Lourens Waldorp
- Psychological Methods, University of Amsterdam, Amsterdam, the Netherlands
| | - Jonas Haslbeck
- Psychological Methods, University of Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
3
|
Liu J, Zhang X, Lin T, Chen R, Zhong Y, Chen T, Wu T, Liu C, Huang A, Nguyen TT, Lee EE, Jeste DV, Tu XM. A New Paradigm for High-dimensional Data: Distance-Based Semiparametric Feature Aggregation Framework via Between-Subject Attributes. Scand Stat Theory Appl 2024; 51:672-696. [PMID: 39101047 PMCID: PMC11296665 DOI: 10.1111/sjos.12695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 10/11/2023] [Indexed: 08/06/2024]
Abstract
This article proposes a distance-based framework incentivized by the paradigm shift towards feature aggregation for high-dimensional data, which does not rely on the sparse-feature assumption or the permutation-based inference. Focusing on distance-based outcomes that preserve information without truncating any features, a class of semiparametric regression has been developed, which encapsulates multiple sources of high-dimensional variables using pairwise outcomes of between-subject attributes. Further, we propose a strategy to address the interlocking correlations among pairs via the U-statistics-based estimating equations (UGEE), which correspond to their unique efficient influence function (EIF). Hence, the resulting semiparametric estimators are robust to distributional misspecification while enjoying root-n consistency and asymptotic optimality to facilitate inference. In essence, the proposed approach not only circumvents information loss due to feature selection but also improves the model's interpretability and computational feasibility. Simulation studies and applications to the human microbiome and wearables data are provided, where the feature dimensions are tens of thousands.
Collapse
Affiliation(s)
- Jinyuan Liu
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, U.S.A
| | - Xinlian Zhang
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - Tuo Lin
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - Ruohui Chen
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - Yuan Zhong
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Tian Chen
- Takeda Pharmaceuticals Cambridge, Massachusetts, U.S.A
| | - Tsungchin Wu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - Chenyu Liu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - Anna Huang
- Department of Psychiatry, Vanderbilt University, Nashville, Tennessee, U.S.A
| | - Tanya T. Nguyen
- Veterans Affairs San Diego Healthcare System, La Jolla, California, U.S.A
- Center for Microbiome Innovation, UC San Diego, San Diego, California, U.S.A
| | - Ellen E. Lee
- Veterans Affairs San Diego Healthcare System, La Jolla, California, U.S.A
- Department of Psychiatry, UC San Diego, San Diego, California, U.S.A
| | - Dilip V. Jeste
- Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - Xin M. Tu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| |
Collapse
|
4
|
Yang X, Cai Z, Wang C, Jiang C, Li J, Chen F, Li W. Integrated multiomic analysis reveals disulfidptosis subtypes in glioblastoma: implications for immunotherapy, targeted therapy, and chemotherapy. Front Immunol 2024; 15:1362543. [PMID: 38504986 PMCID: PMC10950096 DOI: 10.3389/fimmu.2024.1362543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/09/2024] [Indexed: 03/21/2024] Open
Abstract
Introduction Glioblastoma (GBM) presents significant challenges due to its malignancy and limited treatment options. Precision treatment requires subtyping patients based on prognosis. Disulfidptosis, a novel cell death mechanism, is linked to aberrant glucose metabolism and disulfide stress, particularly in tumors expressing high levels of SLC7A11. The exploration of disulfidptosis may provide a new perspective for precise diagnosis and treatment of glioblastoma. Methods Transcriptome sequencing was conducted on samples from GBM patients treated at Tiantan Hospital (January 2022 - December 2023). Data from CGGA and TCGA databases were collected. Consensus clustering based on disulfidptosis features categorized GBM patients into two subtypes (DRGclusters). Tumor immune microenvironment, response to immunotherapy, and drug sensitivity were analyzed. An 8-gene disulfidptosis-based subtype predictor was developed using LASSO machine learning algorithm and validated on CGGA dataset. Results Patients in DRGcluster A exhibited improved overall survival (OS) compared to DRGcluster B. DRGcluster subtypes showed differences in tumor immune microenvironment and response to immunotherapy. The predictor effectively stratified patients into high and low-risk groups. Significant differences in IC50 values for chemotherapy and targeted therapy were observed between risk groups. Discussion Disulfidptosis-based classification offers promise as a prognostic predictor for GBM. It provides insights into tumor immune microenvironment and response to therapy. The predictor aids in patient stratification and personalized treatment selection, potentially improving outcomes for GBM patients.
Collapse
Affiliation(s)
- Xue Yang
- Department of Neuro-oncology Cancer Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zehao Cai
- Department of Neuro-oncology Cancer Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Ce Wang
- Department of Neuro-oncology Cancer Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Chenggang Jiang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Jianguang Li
- Department of Neurosurgery, Aerospace Center Hospital, Beijing, China
| | - Feng Chen
- Department of Neuro-oncology Cancer Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Wenbin Li
- Department of Neuro-oncology Cancer Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
5
|
Huang TJ, Luedtke A, McKeague IW. EFFICIENT ESTIMATION OF THE MAXIMAL ASSOCIATION BETWEEN MULTIPLE PREDICTORS AND A SURVIVAL OUTCOME. Ann Stat 2023; 51:1965-1988. [PMID: 38405375 PMCID: PMC10888526 DOI: 10.1214/23-aos2313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
Collapse
Affiliation(s)
- Tzu-Jung Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | - Alex Luedtke
- Department of Statistics, University of Washington
| | | |
Collapse
|
6
|
Zhang Y, Dai R, Huang Y, Prentice RL, Zheng C. Regression calibration utilizing biomarkers developed from high-dimensional metabolites. Front Nutr 2023; 10:1215768. [PMID: 37599686 PMCID: PMC10433218 DOI: 10.3389/fnut.2023.1215768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/17/2023] [Indexed: 08/22/2023] Open
Abstract
Addressing systematic measurement errors in self-reported data is a critical challenge in association studies of dietary intake and chronic disease risk. The regression calibration method has been utilized for error correction when an objectively measured biomarker is available; however, biomarkers for only a few dietary components have been developed. This paper proposes to use high-dimensional objective measurements to construct biomarkers for many more dietary components and to estimate the diet disease associations. It also discusses the challenges in variance estimation in high-dimensional regression methods and presents a variety of techniques to address this issue, including cross-validation, degrees-of-freedom corrected estimators, and refitted cross-validation (RCV). Extensive simulation is performed to study the finite sample performance of the proposed estimators. The proposed method is applied to the Women's Health Initiative cohort data to examine the associations between the sodium/potassium intake ratio and the total cardiovascular disease.
Collapse
Affiliation(s)
- Yiwen Zhang
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
| | - Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, United States
| | - Ying Huang
- Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, United States
| | - Ross L. Prentice
- Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, United States
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
7
|
Liu J, Glied S, Yakusheva O, Bevin C, Schlak AE, Yoon S, Kulage KM, Poghosyan L. Using machine-learning methods to predict in-hospital mortality through the Elixhauser index: A Medicare data analysis. Res Nurs Health 2023; 46:411-424. [PMID: 37221452 PMCID: PMC10330510 DOI: 10.1002/nur.22322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/25/2023]
Abstract
Accurate in-hospital mortality prediction can reflect the prognosis of patients, help guide allocation of clinical resources, and help clinicians make the right care decisions. There are limitations to using traditional logistic regression models when assessing the model performance of comorbidity measures to predict in-hospital mortality. Meanwhile, the use of novel machine-learning methods is growing rapidly. In 2021, the Agency for Healthcare Research and Quality published new guidelines for using the Present-on-Admission (POA) indicator from the International Classification of Diseases, Tenth Revision, for coding comorbidities to predict in-hospital mortality from the Elixhauser's comorbidity measurement method. We compared the model performance of logistic regression, elastic net model, and artificial neural network (ANN) to predict in-hospital mortality from Elixhauser's measures under the updated POA guidelines. In this retrospective analysis, 1,810,106 adult Medicare inpatient admissions from six US states admitted after September 23, 2017, and discharged before April 11, 2019 were extracted from the Centers for Medicare and Medicaid Services data warehouse. The POA indicator was used to distinguish pre-existing comorbidities from complications that occurred during hospitalization. All models performed well (C-statistics >0.77). Elastic net method generated a parsimonious model, in which there were five fewer comorbidities selected to predict in-hospital mortality with similar predictive power compared to the logistic regression model. ANN had the highest C-statistics compared to the other two models (0.800 vs. 0.791 and 0.791). Elastic net model and AAN can be applied successfully to predict in-hospital mortality.
Collapse
Affiliation(s)
- Jianfang Liu
- Columbia University School of Nursing, New York City, New York, USA
| | - Sherry Glied
- Robert F. Wagner Graduate School of Public Service, New York University, New York City, New York, USA
| | - Olga Yakusheva
- University of Michigan School of Nursing, University of Michigan School of Public Health, Ann Arbor, Michigan, USA
| | - Cohen Bevin
- Mount Sinai Health System, New York City, New York, USA
| | - Amelia E Schlak
- AAAS Science and Technology Policy Fellow, Office of Research and Development, U.S. Department of Veteran Affairs, Washington, DC, USA
| | - Sunmoo Yoon
- Division of General Medicine, Department of Medicine, Columbia University Irving Medical Center, New York City, New York, USA
| | - Kristine M Kulage
- Office of Scholarship and Research Development, Columbia University School of Nursing, New York City, New York, USA
| | - Lusine Poghosyan
- Columbia University School of Nursing and Professor of Health Policy and Management, Mailman School of Public Health, Columbia University, Executive Director Center for Healthcare Delivery Research & Innovations (HDRI), New York City, New York, USA
| |
Collapse
|
8
|
Blair CS, Javanbakht M, Comulada WS, Bolan R, Shoptaw S, Gorbach PM, Needleman J. Comparing Factors Associated with Increased Stimulant Use in Relation to HIV Status Using a Machine Learning and Prediction Modeling Approach. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2023; 24:1102-1114. [PMID: 37328629 PMCID: PMC10795486 DOI: 10.1007/s11121-023-01561-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2023] [Indexed: 06/18/2023]
Abstract
Stimulant use is an important driver of HIV/STI transmission among men who have sex with men (MSM). Evaluating factors associated with increased stimulant use is critical to inform HIV prevention programming efforts. This study seeks to use machine learning variable selection techniques to determine characteristics associated with increased stimulant use and whether these factors differ by HIV status. Data from a longitudinal cohort of predominantly Black/Latinx MSM in Los Angeles, CA was used. Every 6 months from 8/2014-12/2020, participants underwent STI testing and completed surveys evaluating the following: demographics, substance use, sexual risk behaviors, and last partnership characteristics. Least absolute shrinkage and selection operator (lasso) was used to select variables and create predictive models for an interval increase in self-reported stimulant use across study visits. Mixed-effects logistic regression was then used to describe associations between selected variables and the same outcome. Models were also stratified based on HIV status to evaluate differences in predictors associated with increased stimulant use. Among 2095 study visits from 467 MSM, increased stimulant use was reported at 20.9% (n = 438) visits. Increased stimulant use was positively associated with unstable housing (adjusted [a]OR 1.81; 95% CI 1.27-2.57), STI diagnosis (1.59; 1.14-2.21), transactional sex (2.30; 1.60-3.30), and last partner stimulant use (2.21; 1.62-3.00). Among MSM living with HIV, increased stimulant use was associated with binge drinking, vaping/cigarette use (aOR 1.99; 95% CI 1.36-2.92), and regular use of poppers (2.28; 1.38-3.76). Among HIV-negative MSM, increased stimulant use was associated with participating in group sex while intoxicated (aOR 1.81; 95% CI 1.04-3.18), transactional sex (2.53; 1.40-2.55), and last partner injection drug use (1.96; 1.02-3.74). Our findings demonstrate that lasso can be a useful tool for variable selection and creation of predictive models. These results indicate that risk behaviors associated with increased stimulant use may differ based on HIV status and suggest that co-substance use and partnership contexts should be considered in the development of HIV prevention/treatment interventions.
Collapse
Affiliation(s)
- Cheríe S Blair
- Department of Medicine, Division of Infectious Diseases, David Geffen School of Medicine at UCLA, 10833 Le Conte Avenue, CHS 52-215, Los Angeles, CA, 90095, USA.
| | - Marjan Javanbakht
- Department of Epidemiology, UCLA Fielding School of Public Health, Los Angeles, CA, USA
| | - W Scott Comulada
- Department of Health Policy and Management, UCLA Fielding School of Public Health, Los Angeles, CA, USA
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Robert Bolan
- Health and Mental Health Services, Los Angeles LGBT Center, Los Angeles, CA, USA
| | - Steven Shoptaw
- Department of Family Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Pamina M Gorbach
- Department of Medicine, Division of Infectious Diseases, David Geffen School of Medicine at UCLA, 10833 Le Conte Avenue, CHS 52-215, Los Angeles, CA, 90095, USA
- Department of Epidemiology, UCLA Fielding School of Public Health, Los Angeles, CA, USA
| | - Jack Needleman
- Department of Health Policy and Management, UCLA Fielding School of Public Health, Los Angeles, CA, USA
| |
Collapse
|
9
|
Wong A, Kramer SC, Piccininni M, Rohmann JL, Kurth T, Escolano S, Grittner U, Domenech de Cellès M. Using LASSO Regression to Estimate the Population-Level Impact of Pneumococcal Conjugate Vaccines. Am J Epidemiol 2023; 192:1166-1180. [PMID: 36935107 PMCID: PMC10326487 DOI: 10.1093/aje/kwad061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 12/12/2022] [Accepted: 03/13/2023] [Indexed: 03/21/2023] Open
Abstract
Pneumococcal conjugate vaccines (PCVs) protect against diseases caused by Streptococcus pneumoniae, such as meningitis, bacteremia, and pneumonia. It is challenging to estimate their population-level impact due to the lack of a perfect control population and the subtleness of signals when the endpoint-such as all-cause pneumonia-is nonspecific. Here we present a new approach for estimating the impact of PCVs: using least absolute shrinkage and selection operator (LASSO) regression to select variables in a synthetic control model to predict the counterfactual outcome for vaccine impact inference. We first used a simulation study based on hospitalization data from Mexico (2000-2013) to test the performance of LASSO and established methods, including the synthetic control model with Bayesian variable selection (SC). We found that LASSO achieved accurate and precise estimation, even in complex simulation scenarios where the association between the outcome and all control variables was noncausal. We then applied LASSO to real-world hospitalization data from Chile (2001-2012), Ecuador (2001-2012), Mexico (2000-2013), and the United States (1996-2005), and found that it yielded estimates of vaccine impact similar to SC. The LASSO method is accurate and easily implementable and can be applied to study the impact of PCVs and other vaccines.
Collapse
Affiliation(s)
- Anabelle Wong
- Correspondence to Anabelle Wong, Infectious Disease Epidemiology Research Group, Max Planck Institute for Infection Biology, Charitéplatz 1, 10117 Berlin, Germany (e-mail: )
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Gunn HJ, Rezvan PH, Fernández MI, Comulada WS. How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion. Psychol Methods 2023; 28:452-471. [PMID: 35113633 PMCID: PMC10117422 DOI: 10.1037/met0000478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
- Heather J. Gunn
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, United States
| | - Panteha Hayati Rezvan
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles
| | | | - W. Scott Comulada
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles
| |
Collapse
|
11
|
Major-Smith D, Dvořák T, Elhakeem A, Lawlor DA, Tilling K, Smith ADAC. Incorporating interactions into structured life course modelling approaches: A simulation study and applied example of the role of access to green space and socioeconomic position on cardiometabolic health. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.24.23284935. [PMID: 36747796 PMCID: PMC9901056 DOI: 10.1101/2023.01.24.23284935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Background Structured life course modelling approaches (SLCMA) have been developed to understand how exposures across the lifespan relate to later health, but have primarily been restricted to single exposures. As multiple exposures can jointly impact health, here we: i) demonstrate how to extend SLCMA to include exposure interactions; ii) conduct a simulation study investigating the performance of these methods; and iii) apply these methods to explore associations of access to green space, and its interaction with socioeconomic position, with child cardiometabolic health. Methods We used three methods, all based on lasso regression, to select the most plausible life course model: visual inspection, information criteria and cross-validation. The simulation study assessed the ability of these approaches to detect the correct interaction term, while varying parameters which may impact power (e.g., interaction magnitude, sample size, exposure collinearity). Methods were then applied to data from a UK birth cohort. Results There were trade-offs between false negatives and false positives in detecting the true interaction term for different model selection methods. Larger sample size, lower exposure collinearity, centering exposures, continuous outcomes and a larger interaction effect all increased power. In our applied example we found little-to-no association between access to green space, or its interaction with socioeconomic position, and child cardiometabolic outcomes. Conclusions Incorporating interactions between multiple exposures is an important extension to SLCMA. The choice of method depends on the researchers' assessment of the risks of under- vs over-fitting. These results also provide guidance for improving power to detect interactions using these methods.
Collapse
Affiliation(s)
- Daniel Major-Smith
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Tadeáš Dvořák
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Ahmed Elhakeem
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Deborah A. Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Bristol National Institute of Health Research (NIHR) Biomedical Research Centre, Bristol, UK
| | - Kate Tilling
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Bristol National Institute of Health Research (NIHR) Biomedical Research Centre, Bristol, UK
| | - Andrew D. A. C. Smith
- Mathematics and Statistics Research Group, University of the West of England, Bristol, UK
| |
Collapse
|
12
|
Dunn EC, Busso DS, Davis KA, Smith AD, Mitchell C, Tiemeier H, Susser ES. Sensitive Periods for the Effect of Child Maltreatment on Psychopathology Symptoms in Adolescence. Complex Psychiatry 2023; 9:145-153. [PMID: 37900909 PMCID: PMC10601948 DOI: 10.1159/000530120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/20/2023] [Indexed: 10/31/2023] Open
Abstract
Introduction Child maltreatment is among the strongest risk factors for mental disorders. However, little is known about whether there are ages when children may be especially vulnerable to its effects. We sought to identify potential sensitive periods when exposure to the 2 most common types of maltreatment (neglect and harsh physical discipline) had a particularly detrimental effect on youth mental health. Methods Data came from the Future of Families and Child Wellbeing Study (FFCWS), a birth cohort oversampled from "fragile families" (n = 3,474). Maltreatment was assessed at 3, 5, and 9 years of age using an adapted version of the Parent-Child Conflict Tactics Scales (CTS-PC). Using least angle regression, we examined the relationship between repeated measures of exposure to maltreatment on psychopathology symptoms at age 15 years (Child Behavior Checklist; CBCL/6-18). For comparison, we evaluated the strength of evidence to support the existence of sensitive periods in relation to an accumulation of risk model. Results We identified sensitive periods for harsh physical discipline, whereby psychopathology symptom scores were highest among girls exposed at age 9 years (r2 = 0.67 internalizing symptoms; r2 = 1% externalizing symptoms) and among boys exposed at age 5 years (r2 = 0.41%). However, for neglect, the accumulation of risk model explained more variability in psychopathology symptoms for both boys and girls. Conclusion Child maltreatment may have differential effects based on the child's sex, type of exposure, and the age at which it occurs. These findings provide additional evidence for clinicians assessing the benefits and drawbacks of screening efforts and point toward possible mechanisms driving increased vulnerability to psychopathology.
Collapse
Affiliation(s)
- Erin C. Dunn
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Daniel S. Busso
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Harvard Graduate School of Education, Cambridge, MA, USA
| | - Kathryn A. Davis
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Andrew D.A.C. Smith
- Applied Statistics Group, University of the West of England at Bristol, Bristol, UK
| | - Colter Mitchell
- Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Henning Tiemeier
- Department of Child Psychiatry, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Social and Behavioral Science, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Ezra S. Susser
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
- New York State Psychiatric Institute, New York City, NY, USA
| |
Collapse
|
13
|
Li C, Shen X, Pan W. Inference for a Large Directed Acyclic Graph with Unspecified Interventions. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2023; 24:73. [PMID: 37701522 PMCID: PMC10497226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.
Collapse
Affiliation(s)
- Chunlin Li
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
14
|
Wang X, Huang J, Yin G, Huang J, Wu Y. Double bias correction for high-dimensional sparse additive hazards regression with covariate measurement errors. LIFETIME DATA ANALYSIS 2023; 29:115-141. [PMID: 35869178 DOI: 10.1007/s10985-022-09568-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 07/06/2022] [Indexed: 06/15/2023]
Abstract
We propose an inferential procedure for additive hazards regression with high-dimensional survival data, where the covariates are prone to measurement errors. We develop a double bias correction method by first correcting the bias arising from measurement errors in covariates through an estimating function for the regression parameter. By adopting the convex relaxation technique, a regularized estimator for the regression parameter is obtained by elaborately designing a feasible loss based on the estimating function, which is solved via linear programming. Using the Neyman orthogonality, we propose an asymptotically unbiased estimator which further corrects the bias caused by the convex relaxation and regularization. We derive the convergence rate of the proposed estimator and establish the asymptotic normality for the low-dimensional parameter estimator and the linear combination thereof, accompanied with a consistent estimator for the variance. Numerical experiments are carried out on both simulated and real datasets to demonstrate the promising performance of the proposed double bias correction method.
Collapse
Affiliation(s)
- Xiaobo Wang
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, 430072, China
| | - Jiayu Huang
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, 430072, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA, 52242-1419, U.S.A
| | - Yuanshan Wu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, 430073, China.
| |
Collapse
|
15
|
Trippe BL, Deshpande SK, Broderick T. Confidently Comparing Estimates with the c-value. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2153688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
| | | | - Tamara Broderick
- Laboratory for Information and Decision Systems, Massachusetts Institute of Technology
| |
Collapse
|
16
|
Lussier AA, Zhu Y, Smith BJ, Simpkin AJ, Smith AD, Suderman MJ, Walton E, Ressler KJ, Dunn EC. Updates to data versions and analytic methods influence the reproducibility of results from epigenome-wide association studies. Epigenetics 2022; 17:1373-1388. [PMID: 35156895 PMCID: PMC9601563 DOI: 10.1080/15592294.2022.2028072] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 12/02/2021] [Accepted: 01/04/2022] [Indexed: 11/03/2022] Open
Abstract
Biomedical research has grown increasingly cooperative through the sharing of consortia-level epigenetic data. Since consortia preprocess data prior to distribution, new processing pipelines can lead to different versions of the same dataset. Similarly, analytic frameworks evolve to incorporate cutting-edge methods and best practices. However, it remains unknown how different data and analytic versions alter the results of epigenome-wide analyses, which could influence the replicability of epigenetic associations. Thus, we assessed the impact of these changes using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. We analysed DNA methylation from two data versions, processed using separate preprocessing and analytic pipelines, examining associations between seven childhood adversities or prenatal smoking exposure and DNA methylation at age 7. We performed two sets of analyses: (1) epigenome-wide association studies (EWAS); (2) Structured Life Course Modelling Approach (SLCMA), a two-stage method that models time-dependent effects. SLCMA results were also compared across two analytic versions. Data version changes impacted both EWAS and SLCMA analyses, yielding different associations at conventional p-value thresholds. However, the magnitude and direction of associations was generally consistent between data versions, regardless of p-values. Differences were especially apparent in analyses of childhood adversity, while smoking associations were more consistent using significance thresholds. SLCMA analytic versions similarly altered top associations, but time-dependent effects remained concordant. Alterations to data and analytic versions influenced the results of epigenome-wide analyses. Our findings highlight that magnitude and direction are better measures for replication and stability than p-value thresholds.
Collapse
Affiliation(s)
- Alexandre A. Lussier
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yiwen Zhu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brooke J. Smith
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Andrew J. Simpkin
- School of Mathematics,Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland
| | - Andrew D.A.C. Smith
- Mathematics and Statistics Research Group, University of the West of England, Bristol, UK
| | - Matthew J. Suderman
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Esther Walton
- Department of Psychology, University of Bath, Bath, UK
| | - Kerry J. Ressler
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- McLean Hospital, Belmont, MA, USA
| | - Erin C. Dunn
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center on the Developing Child, Harvard University, Cambridge, MA, USA
| |
Collapse
|
17
|
Califf RM, Wong C, Doraiswamy PM, Hong DS, Miller DP, Mega JL. Importance of Social Determinants in Screening for Depression. J Gen Intern Med 2022; 37:2736-2743. [PMID: 34405346 PMCID: PMC9411454 DOI: 10.1007/s11606-021-06957-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 05/27/2021] [Indexed: 01/07/2023]
Abstract
IMPORTANCE The most common screening tool for depression is the Patient Health Questionnaire-9 (PHQ-9). Despite extensive research on the clinical and behavioral implications of the PHQ-9, data are limited on the relationship between PHQ-9 scores and social determinants of health and disease. OBJECTIVE To assess the relationship between the PHQ-9 at intake and other measurements intended to assess social determinants of health. DESIGN, SETTING, AND PARTICIPANTS Cross-sectional analyses of 2502 participants from the Baseline Health Study (BHS), a prospective cohort of adults selected to represent major demographic groups in the US; participants underwent deep phenotyping on demographic, socioeconomic, clinical, laboratory, functional, and imaging findings. INTERVENTIONS None. MAIN OUTCOMES AND MEASURES Cross-sectional measures of clinical and socioeconomic status (SES). RESULTS In addition to a host of clinical and biological factors, higher PHQ-9 scores were associated with female sex, younger participants, people of color, and Hispanic ethnicity. Multiple measures of low SES, including less education, being unmarried, not currently working, and lack of insurance, were also associated with higher PHQ-9 scores across the entire spectrum of PHQ-9 scores. A summative score of SES, which was the 6th most predictive factor, was associated with higher PHQ-9 score after adjusting for 150 clinical, lab testing, and symptomatic characteristics. CONCLUSIONS AND RELEVANCE Our findings underscore that depression should be considered a comorbidity when social determinants of health are addressed, and both elements should be considered when designing appropriate interventions.
Collapse
Affiliation(s)
| | | | - P Murali Doraiswamy
- Department of Psychiatry and Behavioral Sciences and the Duke Institute for Brain Sciences, Duke University School of Medicine, Durham, NC, USA
| | - David S Hong
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | |
Collapse
|
18
|
Baranyi G, Welstead M, Corley J, Deary IJ, Muniz-Terrera G, Redmond P, Shortt N, Taylor AM, Ward Thompson C, Cox SR, Pearce J. Association of Life-Course Neighborhood Deprivation With Frailty and Frailty Progression From Ages 70 to 82 Years in the Lothian Birth Cohort 1936. Am J Epidemiol 2022; 191:1856-1866. [PMID: 35882379 PMCID: PMC9626928 DOI: 10.1093/aje/kwac134] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 06/17/2022] [Accepted: 07/22/2022] [Indexed: 02/01/2023] Open
Abstract
Neighborhood features have been postulated to be key predictors of frailty. However, evidence is mainly limited to cross-sectional studies without indication of long-term impact. We explored how neighborhood social deprivation (NSD) across the life course is associated with frailty and frailty progression among older Scottish adults. Participants (n = 323) were persons selected from the Lothian Birth Cohort 1936 with historical measures of NSD in childhood (1936-1955), young adulthood (1956-1975), and mid- to late adulthood (1976-2014). Frailty was measured 5 times between the ages of 70 and 82 years using the Frailty Index. Confounder-adjusted life-course models were assessed using a structured modeling approach; associations were estimated for frailty at baseline using linear regression and for frailty progression using linear mixed-effects models. Accumulation was the most appropriate life-course model for males; greater accumulated NSD was associated with higher frailty at baseline (b = 0.017, 95% confidence interval: 0.005, 0.029). Among females, the mid- to late adulthood sensitive period was the best-fitting life-course model, and higher NSD in this period was associated with widening frailty trajectories (b = 0.005, 95% confidence interval: 0.0004, 0.009). To our knowledge, this is the first investigation of the life-course impact of NSD on frailty in a cohort of older adults. Policies designed to address deprivation and inequalities across the full life course may support healthy aging.
Collapse
Affiliation(s)
- Gergő Baranyi
- Correspondence to Dr. Gergő Baranyi, Centre for Research on Environment, Society and Health, School of GeoSciences, University of Edinburgh, Drummond Street, Edinburgh EH89XP, United Kingdom (e-mail: )
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Kammer M, Dunkler D, Michiels S, Heinze G. Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study. BMC Med Res Methodol 2022; 22:206. [PMID: 35883041 PMCID: PMC9316707 DOI: 10.1186/s12874-022-01681-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 07/11/2022] [Indexed: 12/03/2022] Open
Abstract
Background Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. Methods We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. Results Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. Conclusions Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01681-y.
Collapse
Affiliation(s)
- Michael Kammer
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.,Division of Nephrology and Dialysis, Department for Internal Medicine III, Medical University of Vienna, Vienna, Austria
| | - Daniela Dunkler
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Stefan Michiels
- Service de Biostatistique et d'Epidémiologie, Gustave Roussy, Oncostat U1018, INSERM, University Paris-Saclay, labeled Ligue Contre le Cancer, Villejuif, France
| | - Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
20
|
Chen S, Dai Y, Ma X, Peng H, Wang D, Wang Y. Personalized optimal nutrition lifestyle for self obesity management using metaalgorithms. Sci Rep 2022; 12:12387. [PMID: 35858966 PMCID: PMC9297061 DOI: 10.1038/s41598-022-16260-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 07/07/2022] [Indexed: 12/03/2022] Open
Abstract
Precision medicine applies machine learning methods to estimate the personalized optimal treatment decision based on individual information, such as genetic data and medical history. The main purpose of self obesity management is to develop a personalized optimal life plan that is easy to implement and adhere to, thereby reducing the incidence of obesity and obesity-related diseases. The methodology comprises three components. First, we apply catboost, random forest and lasso covariance test to evaluate the importance of individual features in forecasting body mass index. Second, we apply metaalgorithms to estimate the personalized optimal decision on alcohol, vegetable, high caloric food and daily water intake respectively for each individual. Third, we propose new metaalgorithms named SX and SXwint learners to compute the personalized optimal decision and compare their performances with other prevailing metalearners. We find that people who receive individualized optimal treatment options not only have lower obesity levels than others, but also have lower obesity levels than those who receive ’one-for-all’ treatment options. In conclusion, all metaalgorithms are effective at estimating the personalized optimal decision, where SXwint learner shows the best performance on daily water intake.
Collapse
Affiliation(s)
- Shizhao Chen
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Yiran Dai
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Xiaoman Ma
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Huimin Peng
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China.
| | - Donghui Wang
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Yili Wang
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| |
Collapse
|
21
|
Smith BJ, Smith ADAC, Dunn EC. Statistical Modeling of Sensitive Period Effects Using the Structured Life Course Modeling Approach (SLCMA). Curr Top Behav Neurosci 2022; 53:215-234. [PMID: 35460052 DOI: 10.1007/7854_2021_280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Sensitive periods are times during development when life experiences can have a greater impact on outcomes than at other periods during the life course. However, a dearth of sophisticated methods for studying time-dependent exposure-outcome relationships means that sensitive periods are often overlooked in research studies in favor of more simplistic and easier-to-use hypotheses such as ever being exposed, or the effect of an exposure accumulated over time. The structured life course modeling approach (SLCMA; pronounced "slick-mah") allows researchers to model complex life course hypotheses, such as sensitive periods, to determine which hypothesis best explains the amount of variation between a repeated exposure and an outcome. The SLCMA makes use of the least angle regression (LARS) variable selection technique, a type of least absolute shrinkage and selection operator (LASSO) estimation procedure, to yield a parsimonious model for the exposure-outcome relationship of interest. The results of the LARS procedure are complemented with a post-selection inference method, called selective inference, which provides unbiased effect estimates, confidence intervals, and p-values for the final explanatory model. In this chapter, we provide a brief overview of the genesis of this sensitive period modeling approach and provide a didactic step-by-step user's guide to implement the SLCMA in sensitive- period research. R code to complete the SLCMA is available on our GitHub page at: https://github.com/thedunnlab/SLCMA-pipeline .
Collapse
Affiliation(s)
- Brooke J Smith
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Andrew D A C Smith
- Mathematics and Statistics Research Group, University of the West of England, Bristol, UK
| | - Erin C Dunn
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. .,Department of Psychiatry, Harvard Medical School, Boston, MA, USA. .,Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA. .,Harvard Center on the Developing Child, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
22
|
Mizuno A, Karim HT, Newmark J, Khan F, Rosenblatt MJ, Neppach AM, Lowe M, Aizenstein HJ, Mennin DS, Andreescu C. Thinking of Me or Thinking of You? Behavioral Correlates of Self vs. Other Centered Worry and Reappraisal in Late-Life. Front Psychiatry 2022; 13:780745. [PMID: 35815034 PMCID: PMC9256986 DOI: 10.3389/fpsyt.2022.780745] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
Abstract
Psychotherapeutic approaches in late-life anxiety have limited effect on reducing worry severity. The self-referential processing of worry contents (self- vs. other-focused worry) and reappraisal styles (internal vs. external locus of control) are important elements in psychotherapy, but little is known about these processes in late-life. We aimed to characterize severe worry from a self-referential processing perspective. We recruited 104 older adults with various levels of worry and used a personalized task to induce and reappraise worry. We analyzed the association between (1) worry severity/frequency for worry content (self- or other-focused) and (2) for reappraisal style (internal vs. external locus of control) with clinical inventories measuring anxiety, worry, depression, rumination, neuroticism, emotion regulation strategies, perceived stress, and physical illness burden. Higher self-worry severity was associated with higher scores of clinical inventories of worry, depression, perceived stress, and neuroticism, whereas other-worry severity did not show any association. Greater self-worry frequency was associated with higher medical burden. External locus of control in reappraisal statements was associated with lower worry severity in men. Overall, more severe and frequent self-focused worry was associated with a greater psychological and physiological burden. These results are useful in tailoring psychotherapy for older adults with severe worry.
Collapse
Affiliation(s)
- Akiko Mizuno
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
| | - Helmet Talib Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jordyn Newmark
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
| | - Faiha Khan
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Alyssa M. Neppach
- Department of Neuroscience, University of Pittsburgh, Pittsburgh, PA, United States
| | - MaKayla Lowe
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Howard Jay Aizenstein
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - Douglas S. Mennin
- Department of Counseling and Clinical Psychology, Teachers College, Columbia University, New York, NY, United States
| | - Carmen Andreescu
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
23
|
Affiliation(s)
| | - Buyu Lin
- Department of Statistics, Harvard University
| | - Xin Xing
- Department of Statistics, Virginia Tech
| | - Jun S. Liu
- Department of Statistics, Harvard University
| |
Collapse
|
24
|
Guo C, Liu Z, Cao C, Zheng Y, Lu T, Yu Y, Wang L, Liu L, Liu S, Hua Z, Han X, Li Z. Development and Validation of Ischemic Events Related Signature After Carotid Endarterectomy. Front Cell Dev Biol 2022; 10:794608. [PMID: 35372347 PMCID: PMC8969028 DOI: 10.3389/fcell.2022.794608] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/04/2022] [Indexed: 12/29/2022] Open
Abstract
Background: Ischemic events after carotid endarterectomy (CEA) in carotid artery stenosis patients are unforeseeable and alarming. Therefore, we aimed to establish a novel model to prevent recurrent ischemic events after CEA. Methods: Ninety-eight peripheral blood mononuclear cell samples were collected from carotid artery stenosis patients. Based on weighted gene co-expression network analysis, we performed whole transcriptome correlation analysis and extracted the key module related to ischemic events. The biological functions of the 292 genes in the key module were annotated via GO and KEGG enrichment analysis, and the protein-protein interaction (PPI) network was constructed via the STRING database and Cytoscape software. The enrolled samples were divided into train (n = 66), validation (n = 28), and total sets (n = 94). In the train set, the random forest algorithm was used to identify critical genes for predicting ischemic events after CEA, and further dimension reduction was performed by LASSO logistic regression. A diagnosis model was established in the train set and verified in the validation and total sets. Furthermore, fifty peripheral venous blood samples from patients with carotid stenosis in our hospital were used as an independent cohort to validation the model by RT-qPCR. Meanwhile, GSEA, ssGSEA, CIBERSORT, and MCP-counter were used to enrichment analysis in high- and low-risk groups, which were divided by the median risk score. Results: We established an eight-gene model consisting of PLSCR1, ECRP, CASP5, SPTSSA, MSRB1, BCL6, FBP1, and LST1. The ROC-AUCs and PR-AUCs of the train, validation, total, and independent cohort were 0.891 and 0.725, 0.826 and 0.364, 0.869 and 0.654, 0.792 and 0.372, respectively. GSEA, ssGSEA, CIBERSORT, and MCP-counter analyses further revealed that high-risk patients presented enhanced immune signatures, which indicated that immunotherapy may improve clinical outcomes in these patients. Conclusion: An eight-gene model with high accuracy for predicting ischemic events after CEA was constructed. This model might be a promising tool to facilitate the clinical management and postoperative surveillance of carotid artery stenosis patients.
Collapse
Affiliation(s)
- Chunguang Guo
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Can Cao
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
| | - Youyang Zheng
- Department of Cardiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Taoyuan Lu
- Department of Cerebrovascular Disease, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Yin Yu
- Department of Pathophysiology, School of Basic Medical Sciences, The Academy of Medical Science, Zhengzhou University, Zhengzhou, China
| | - Libo Wang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Long Liu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Shirui Liu
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zhaohui Hua
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| | - Zhen Li
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| |
Collapse
|
25
|
Guo C, Liu Z, Yu Y, Liu S, Ma K, Ge X, Xing Z, Lu T, Weng S, Wang L, Liu L, Hua Z, Han X, Li Z. Integrated Analysis of Multi-Omics Alteration, Immune Profile, and Pharmacological Landscape of Pyroptosis-Derived lncRNA Pairs in Gastric Cancer. Front Cell Dev Biol 2022; 10:816153. [PMID: 35281096 PMCID: PMC8916586 DOI: 10.3389/fcell.2022.816153] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/09/2022] [Indexed: 12/11/2022] Open
Abstract
Background: Recent evidence demonstrates that pyroptosis-derived long non-coding RNAs (lncRNAs) have profound impacts on the initiation, progression, and microenvironment of tumors. However, the roles of pyroptosis-derived lncRNAs (PDLs) in gastric cancer (GC) remain elusive. Methods: We comprehensively analyzed the multi-omics data of 839 GC patients from three independent cohorts. The previous gene set enrichment analysis embedding algorithm was utilized to identify PDLs. A gene pair pipeline was developed to facilitate clinical translation via qualitative relative expression orders. The LASSO algorithm was used to construct and validate a pyroptosis-derived lncRNA pair prognostics signature (PLPPS). The associations between PLPPS and multi-omics alteration, immune profile, and pharmacological landscape were further investigated. Results: A total of 350 PDLs and 61,075 PDL pairs in the training set were generated. Cox regression revealed 15 PDL pairs associated with overall survival, which were utilized to construct the PLPPS model via the LASSO algorithm. The high-risk group demonstrated adverse prognosis relative to the low-risk group. Remarkably, genomic analysis suggested that the lower tumor mutation burden and gene mutation frequency (e.g., TTN, MUC16, and LRP1B) were found in the high-risk group patients. The copy number variants were not significantly different between the two groups. Additionally, the high-risk group possessed lower immune cell infiltration abundance and might be resistant to a few chemotherapeutic drugs (including cisplatin, paclitaxel, and gemcitabine). Conclusion: PDLs were closely implicated in the biological process and prognosis of GC, and our PLPPS model could serve as a promising tool to advance prognostic management and personalized treatment of GC patients.
Collapse
Affiliation(s)
- Chunguang Guo
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yin Yu
- Department of Pathophysiology, School of Basic Medical Sciences, The Academy of Medical Science, Zhengzhou University, Zhengzhou, China
| | - Shirui Liu
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Ke Ma
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xiaoyong Ge
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zhe Xing
- Department of Neurosurgery, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Taoyuan Lu
- Department of Cerebrovascular Disease, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Libo Wang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Long Liu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zhaohui Hua
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| | - Zhen Li
- Department of Endovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Zhaohui Hua, ; Xinwei Han, ; Zhen Li,
| |
Collapse
|
26
|
Cao X, Gregory K, Wang D. Inference for sparse linear regression based on the leave-one-covariate-out solution path. COMMUN STAT-THEOR M 2022; 52:6640-6657. [PMID: 37840573 PMCID: PMC10572792 DOI: 10.1080/03610926.2022.2032171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 01/17/2022] [Indexed: 11/03/2022]
Abstract
We propose a new measure of variable importance in high-dimensional regression based on the change in the LASSO solution path when one covariate is left out. The proposed procedure provides a novel way to calculate variable importance and conduct variable screening. In addition, our procedure allows for the construction of p-values for testing whether each coe cient is equal to zero as well as for testing hypotheses involving multiple regression coefficients simultaneously; bootstrap techniques are used to construct the null distribution. For low-dimensional linear models, our method can achieve higher power than the t-test. Extensive simulations are provided to show the effectiveness of our method. In the high-dimensional setting, our proposed solution path based test achieves greater power than some other recently developed high-dimensional inference methods. We extend our method to logistic regression and demonstrate in simulation that our leave-one-covariate-out solution path tests can provide accurate p-values.
Collapse
Affiliation(s)
- Xiangyang Cao
- 216 LeConte College, 1523 Greene St, Columbia, SC 29201, USA
| | - Karl Gregory
- 216 LeConte College, 1523 Greene St, Columbia, SC 29201, USA
| | - Dewei Wang
- 216 LeConte College, 1523 Greene St, Columbia, SC 29201, USA
| |
Collapse
|
27
|
Abstract
Social cognitive deficits can have many negative consequences, spanning social withdrawal to psychopathology. Prior work has shown that child maltreatment may associate with poorer social cognitive skills in later life. However, no studies have examined this association from early childhood into adolescence. Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 4,438), we examined the association between maltreatment (caregiver physical or emotional abuse; sexual or physical abuse), assessed repeatedly (every 1-3 years) from birth to age 9, and social cognitive skills at ages 7.5, 10.5, and 14 years. We evaluated the role of both the developmental timing (defined by age at exposure) and accumulation of maltreatment (defined as the number of occasions exposed) using a least angle regression variable selection procedure, followed by structural equation modeling. Among females, accumulation of maltreatment explained the most variation in social cognitive skills. For males, no significant associations were found. These findings underscore the importance of early intervention to minimize the accumulation of maltreatment and showcase the importance of prospective studies to understand the development of social cognition over time.
Collapse
|
28
|
Battey HS, Cox DR. Some Perspectives on Inference in High Dimensions. Stat Sci 2022. [DOI: 10.1214/21-sts824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- H. S. Battey
- H. S. Battey is Assistant Professor, Department of Mathematics, Imperial College London, SW7 2AZ, UK
| | - D. R. Cox
- D. R. Cox is Honorary Fellow, Nuffield College, University of Oxford OX1 1NF, UK
| |
Collapse
|
29
|
Sutton M, Sugier PE, Truong T, Liquet B. Leveraging pleiotropic association using sparse group variable selection in genomics data. BMC Med Res Methodol 2022; 22:9. [PMID: 34996381 PMCID: PMC8742466 DOI: 10.1186/s12874-021-01491-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 12/03/2021] [Indexed: 12/04/2022] Open
Abstract
Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Methods We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. Results Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. Conclusion We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
Collapse
Affiliation(s)
- Matthew Sutton
- Queensland University of Technology Centre for Data Science, Brisbane, Australia.
| | - Pierre-Emmanuel Sugier
- Laboratoire De Mathématiques et de leurs Applications de PAU E2S UPPA, CNRS, Pau, France.,University Paris-Saclay, UVSQ, Inserm, Gustave Roussy, CESP, Team "Exposome and Heredity", Villejuif, France
| | - Therese Truong
- University Paris-Saclay, UVSQ, Inserm, Gustave Roussy, CESP, Team "Exposome and Heredity", Villejuif, France
| | - Benoit Liquet
- Laboratoire De Mathématiques et de leurs Applications de PAU E2S UPPA, CNRS, Pau, France.,Department of Mathematics and Statistics, Macquarie University, Sydney, Australia
| |
Collapse
|
30
|
Schwarz A, Roeder I, Seifert M. Comparative Gene Expression Analysis Reveals Similarities and Differences of Chronic Myeloid Leukemia Phases. Cancers (Basel) 2022; 14:cancers14010256. [PMID: 35008420 PMCID: PMC8750437 DOI: 10.3390/cancers14010256] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/15/2021] [Accepted: 12/21/2021] [Indexed: 12/25/2022] Open
Abstract
Chronic myeloid leukemia (CML) is a slowly progressing blood cancer that primarily affects elderly people. Without successful treatment, CML progressively develops from the chronic phase through the accelerated phase to the blast crisis, and ultimately to death. Nowadays, the availability of targeted tyrosine kinase inhibitor (TKI) therapies has led to long-term disease control for the vast majority of patients. Nevertheless, there are still patients that do not respond well enough to TKI therapies and available targeted therapies are also less efficient for patients in accelerated phase or blast crises. Thus, a more detailed characterization of molecular alterations that distinguish the different CML phases is still very important. We performed an in-depth bioinformatics analysis of publicly available gene expression profiles of the three CML phases. Pairwise comparisons revealed many differentially expressed genes that formed a characteristic gene expression signature, which clearly distinguished the three CML phases. Signaling pathway expression patterns were very similar between the three phases but differed strongly in the number of affected genes, which increased with the phase. Still, significant alterations of MAPK, VEGF, PI3K-Akt, adherens junction and cytokine receptor interaction signaling distinguished specific phases. Our study also suggests that one can consider the phase-wise CML development as a three rather than a two-step process. This is in accordance with the phase-specific expression behavior of 24 potential major regulators that we predicted by a network-based approach. Several of these genes are known to be involved in the accumulation of additional mutations, alterations of immune responses, deregulation of signaling pathways or may have an impact on treatment response and survival. Importantly, some of these genes have already been reported in relation to CML (e.g., AURKB, AZU1, HLA-B, HLA-DMB, PF4) and others have been found to play important roles in different leukemias (e.g., CDCA3, RPL18A, PRG3, TLX3). In addition, increased expression of BCL2 in the accelerated and blast phase indicates that venetoclax could be a potential treatment option. Moreover, a characteristic signaling pathway signature with increased expression of cytokine and ECM receptor interaction pathway genes distinguished imatinib-resistant patients from each individual CML phase. Overall, our comparative analysis contributes to an in-depth molecular characterization of similarities and differences of the CML phases and provides hints for the identification of patients that may not profit from an imatinib therapy, which could support the development of additional treatment strategies.
Collapse
Affiliation(s)
- Annemarie Schwarz
- Institute for Medical Informatics and Biometry (IMB), Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, D-01307 Dresden, Germany; (A.S.); (I.R.)
| | - Ingo Roeder
- Institute for Medical Informatics and Biometry (IMB), Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, D-01307 Dresden, Germany; (A.S.); (I.R.)
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany: German Cancer Research Center (DKFZ), D-69120 Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, D-01307 Dresden, Germany; Helmholtz-Zentrum Dresden—Rossendorf (HZDR), D-01328 Dresden, Germany
| | - Michael Seifert
- Institute for Medical Informatics and Biometry (IMB), Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, D-01307 Dresden, Germany; (A.S.); (I.R.)
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany: German Cancer Research Center (DKFZ), D-69120 Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, D-01307 Dresden, Germany; Helmholtz-Zentrum Dresden—Rossendorf (HZDR), D-01328 Dresden, Germany
- Correspondence:
| |
Collapse
|
31
|
Zhang D, Khalili A, Asgharian M. Post-model-selection inference in linear regression models: An integrated review. STATISTICS SURVEYS 2022. [DOI: 10.1214/22-ss135] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Dongliang Zhang
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Abbas Khalili
- Department of Mathematics and Statistics, McGill University, Montréal, QC, Canada
| | - Masoud Asgharian
- Department of Mathematics and Statistics, McGill University, Montréal, QC, Canada
| |
Collapse
|
32
|
Meng Z, Yang W, Zhu L, Liu W, Wang Y. A novel necroptosis-related LncRNA signature for prediction of prognosis and therapeutic responses of head and neck squamous cell carcinoma. Front Pharmacol 2022; 13:963072. [PMID: 36016575 PMCID: PMC9395581 DOI: 10.3389/fphar.2022.963072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Long non-coding RNAs (lncRNAs) play an essential role in the occurrence and prognosis of tumors, and it has great potential as biomarkers of tumors. However, the roles of Necroptosis-related lncRNA (NRLs) in Head and neck squamous cell carcinoma (HNSCC) remain elusive. Methods: We comprehensively analyzed the gene expression and clinical information of 964 HNSCC in four cohorts. LASSO regression was utilized to construct a necroptosis-related lncRNA prognosis signature (NLPS). We used univariate and multivariate regression to assess the independent prognostic value of NLPS. Based on the optimal cut-off, patients were divided into high- and low-risk groups. In addition, the immune profile, multi-omics alteration, and pharmacological landscape of NLPS were further revealed. Results: A total of 21 NRLs associated with survival were identified by univariate regression in four cohorts. We constructed and validated a best prognostic model (NLPS). Compared to the low-risk group, patients in the high group demonstrated a more dismal prognosis. After adjusting for clinical features by multivariate analysis, NLPS still displayed independent prognostic value. Additionally, further analysis found that patients in the low-risk group showed more abundant immune cell infiltration and immunotherapy response. In contrast, patients in the high-risk group were more sensitive to multiple chemotherapeutic agents. Conclusion: As a promising tool, the establishment of NLPS provides guidance and assistance in the clinical management and personalized treatment of HNSCC.
Collapse
|
33
|
Song L, Feng D, Tan J, Zhang H. Novel ferroptosis-related gene signature as a potential prognostic tool for gastric cancer. EUR J INFLAMM 2022. [DOI: 10.1177/1721727x221122705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Objectives Gastric cancer (GC) is a major global health concern and is difficult to diagnose in the early stage. Ferroptosis is an iron-dependent, novel form of non-apoptotic cell death. In recent years, inducing the upregulation of ferroptosis-related genes has become a promising therapeutic alternative for cancers, especially those resistant to traditional treatments. However, the relationship between ferroptosis-related genes and GC remains to be further elucidated. Methods In the present study, mRNA expression profiles and corresponding clinical data of patients with GC were retrieved from The Cancer Genome Atlas and used as test data. A multigene signature was constructed using the least absolute shrinkage and selection operator Cox regression model. Data of patients with GC from ‘GSE84426’ in the Gene Expression Omnibus database were used as Training data for validation. Results More than half ferroptosis-related genes were differentially expressed in GC tissues and adjacent normal tissue samples (58.43%) in the test data. Univariate Cox regression analysis showed that 16 differentially expressed genes were related to the prognosis of GC (all p < 0.05). Expression profiles of the 16 DGEs were analysed using LASSO Cox regression, and a prognostic model was established by selecting the 10 best genes for λ. These 10 genes were used to construct a 10-gene signature and stratify patients into two risk groups. Based on the median risk score in the test data, patients with GC were divided into high- and low-risk groups ( p < 0.001). Risk score was an independent predictor for overall survival in multivariate Cox regression analyses ( p < 0.001 and <0.01 in the test and training data, respectively; hazard ratio >1). Receiver operating characteristic curve analysis confirmed the predictive capacity of the 10-gene signature. Functional analysis revealed that tumour-infiltrating lymphocytes, antigen-presenting cell co-stimulation, and cytokine-cytokine receptors were enriched; however, the immune status differed between the two risk groups. Conclusion The novel ferroptosis-related gene signature can be used for GC prognosis. In addition, ferroptosis may provide a novel alternative for the diagnosis and treatment of GC.
Collapse
Affiliation(s)
- Ling Song
- Department of Pharmacy, Renmin Hospital of Wuhan University, Wuhan, China
| | - Dou Feng
- Department of Pharmacy, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jiajie Tan
- Department of Pharmacy, Renmin Hospital of Wuhan University, Wuhan, China
| | - Hong Zhang
- Department of Pharmacy, Renmin Hospital of Wuhan University, Wuhan, China
| |
Collapse
|
34
|
Denault WRP, Gjessing HK, Juodakis J, Jacobsson B, Jugessur A. Wavelet Screening: a novel approach to analyzing GWAS data. BMC Bioinformatics 2021; 22:484. [PMID: 34620077 PMCID: PMC8499521 DOI: 10.1186/s12859-021-04356-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 09/06/2021] [Indexed: 11/13/2022] Open
Abstract
Background Traditional methods for single-variant genome-wide association study (GWAS) incur a substantial multiple-testing burden because of the need to test for associations with a vast number of single-nucleotide polymorphisms (SNPs) simultaneously. Further, by ignoring more complex joint effects of nearby SNPs within a given region, these methods fail to consider the genomic context of an association with the outcome. Results To address these shortcomings, we present a more powerful method for GWAS, coined ‘Wavelet Screening’ (WS), that greatly reduces the number of tests to be performed. This is achieved through the use of a sliding-window approach based on wavelets to sequentially screen the entire genome for associations. Wavelets are oscillatory functions that are useful for analyzing the local frequency and time behavior of signals. The signals can then be divided into different scale components and analyzed separately. In the current setting, we consider a sequence of SNPs as a genetic signal, and for each screened region, we transform the genetic signal into the wavelet space. The null and alternative hypotheses are modeled using the posterior distribution of the wavelet coefficients. WS is enhanced by using additional information from the regression coefficients and by taking advantage of the pyramidal structure of wavelets. When faced with more complex genetic signals than single-SNP associations, we show via simulations that WS provides a substantial gain in power compared to both the traditional GWAS modeling and another popular regional association test called SNP-set (Sequence) Kernel Association Test (SKAT). To demonstrate feasibility, we applied WS to a large Norwegian cohort (N=8006) with genotypes and information available on gestational duration. Conclusions WS is a powerful and versatile approach to analyzing whole-genome data and lends itself easily to investigating various omics data types. Given its broader focus on the genomic context of an association, WS may provide additional insight into trait etiology by revealing genes and loci that might have been missed by previous efforts.
Collapse
Affiliation(s)
- William R P Denault
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway. .,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway. .,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.
| | - Håkon K Gjessing
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| | - Julius Juodakis
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Bo Jacobsson
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Astanand Jugessur
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| |
Collapse
|
35
|
Peterson S, Ibrahim M, Anderson PL, Moore CM, MaWhinney S. A comparison of covariate selection techniques applied to pre-exposure prophylaxis (PrEP) drug concentration data in men and transgender women at risk for HIV. J Pharmacokinet Pharmacodyn 2021; 48:655-669. [PMID: 34013454 DOI: 10.1007/s10928-021-09763-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 05/05/2021] [Indexed: 11/30/2022]
Abstract
Pre-exposure prophylaxis (PrEP) containing antiretrovirals tenofovir disoproxil fumarate (TDF) or tenofovir alafenamide (TAF) can reduce the risk of acquiring HIV. Concentrations of intracellular tenofovir-diphosphate (TFV-DP) measured in dried blood spots (DBS) have been used to quantify PrEP adherence; although even under directly observed dosing, unexplained between-subject variation remains. Here, we wish to identify patient-specific factors associated with TFV-DP levels. Data from the iPrEX Open Label Extension (OLE) study were used to compare multiple covariate selection methods for determining demographic and clinical covariates most important for drug concentration estimation. To allow for the possibility of non-linear relationships between drug concentration and explanatory variables, the component selection and smoothing operator (COSSO) was implemented. We compared COSSO to LASSO, a commonly used machine learning approach, and traditional forward and backward selection. Training (N = 387) and test (N = 166) datasets were utilized to compare prediction accuracy across methods. LASSO and COSSO had the best predictive ability for the test data. Both predicted increased drug concentration with increases in age and self-reported adherence, the latter with a steeper trajectory among Asians. TFV-DP reductions were associated with increasing eGFR, hemoglobin and transgender status. COSSO also predicted lower TFV-DP with increasing weight and South American countries. COSSO identified non-linear relationships between log(TFV-DP) and adherence, weight and eGFR, with differing trajectories for some races. COSSO identified non-linear log(TFV-DP) trajectories with a subset of covariates, which may better explain variation and enhance prediction. Future research is needed to examine differences identified in trajectories by race and country.
Collapse
Affiliation(s)
- Skyler Peterson
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, 13001 E 17th Pl, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Utah, Salt Lake City, UT, 84108, USA
| | - Mustafa Ibrahim
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, University of Colorado Anschutz Medical Campus, V20-C238, Room 4101, 12850 E. Montview Blvd, Aurora, CO, 80045, USA
| | - Peter L Anderson
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, University of Colorado Anschutz Medical Campus, V20-C238, Room 4101, 12850 E. Montview Blvd, Aurora, CO, 80045, USA
| | - Camille M Moore
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, 13001 E 17th Pl, Aurora, CO, 80045, USA
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Samantha MaWhinney
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, 13001 E 17th Pl, Aurora, CO, 80045, USA.
| |
Collapse
|
36
|
Fang F, Zhao J, Ahmed SE, Qu A. A weak‐signal‐assisted procedure for variable selection and statistical inference with an informative subsample. Biometrics 2021. [DOI: 10.1111/biom.13346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Fang Fang
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science ‐ MOE School of Statistics East China Normal University Shanghai China
| | - Jiwei Zhao
- Department of Biostatistics and Medical Informatics University of Wisconsin Madison Wisconsin
| | - S. Ejaz Ahmed
- Faculty of Mathematics and Science Brock University St. Catharines Ontario Canada
| | - Annie Qu
- Department of Statistics University of California Irvine California
| |
Collapse
|
37
|
Wurm MJ, Rathouz PJ, Hanlon BM. Regularized Ordinal Regression and the ordinalNet R Package. J Stat Softw 2021; 99:10.18637/jss.v099.i06. [PMID: 34512213 PMCID: PMC8432594 DOI: 10.18637/jss.v099.i06] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal (ELMO) class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class.
Collapse
Affiliation(s)
- Michael J Wurm
- Department of Statistics, University of Wisconsin-Madison,
| | - Paul J Rathouz
- Department of Population Health, Dell Medical School at the University of Texas at Austin,
| | - Bret M Hanlon
- Department of Surgery, University of Wisconsin School of Medicine and Public Health, 600 Highland Avenue Madison, WI 53792,
| |
Collapse
|
38
|
Du L, Guo X, Sun W, Zou C. False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1945459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Lilun Du
- Department of ISOM, Hong Kong University of Science and Technology, ISOM, Kowloon, Hong Kong
| | - Xu Guo
- Department of Mathematical Statistics, Beijing Normal University, Beijing, China
| | - Wenguang Sun
- Data Sciences and Operations, University of Southern California, Los Angeles, CA
| | - Changliang Zou
- Department of Statistics and Data Sciences, Nankai University, Tianjin, China
| |
Collapse
|
39
|
Yan X, Zhu Z. Quantifying the impact of COVID-19 on e-bike safety in China via multi-output and clustering-based regression models. PLoS One 2021; 16:e0256610. [PMID: 34415973 PMCID: PMC8378728 DOI: 10.1371/journal.pone.0256610] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 08/10/2021] [Indexed: 11/19/2022] Open
Abstract
The impacts of COVID-19 on travel demand, traffic congestion, and traffic safety are attracting heated attention. However, the influence of the pandemic on electric bike (e-bike) safety has not been investigated. This paper fills the research gap by analyzing how COVID-19 affects China's e-bike safety based on a province-level dataset containing e-bike safety metrics, socioeconomic information, and COVID-19 cases from 2017 to 2020. Multi-output regression models are adopted to investigate the overall impact of COVID-19 on e-bike safety in China. Clustering-based regression models are used to examine the heterogeneous effects of COVID-19 and the other explanatory variables in different provinces/municipalities. This paper confirms the high relevance between COVID-19 and the e-bike safety condition in China. The number of COVID-19 cases has a significant negative effect on the number of e-bike fatalities/injuries at the country level. Moreover, two clusters of provinces/municipalities are identified: one (cluster 1) with lower and the other (cluster 2 that includes Hubei province) higher number of e-bike fatalities/injuries. In the clustering-based regressions, the absolute coefficients of the COVID-19 feature for cluster 2 are much larger than those for cluster 1, indicating that the pandemic could significantly reduce e-bike safety issues in provinces with more e-bike fatalities/injuries.
Collapse
Affiliation(s)
- Xingpei Yan
- School of Automobile, Chang’an University, Xi’an, P.R. China
- Department of Traffic Policy Planning Research, Research Institute for Road Safety of Ministry of Public Security, Beijing, P.R. China
| | - Zheng Zhu
- Department of Civil and Environmental Engineering, the Hong Kong University of Science and Technology, Hong Kong, P.R. China
| |
Collapse
|
40
|
Computational Probing the Methylation Sites Related to EGFR Inhibitor-Responsive Genes. Biomolecules 2021; 11:biom11071042. [PMID: 34356665 PMCID: PMC8302001 DOI: 10.3390/biom11071042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 07/09/2021] [Accepted: 07/15/2021] [Indexed: 12/31/2022] Open
Abstract
The emergence of drug resistance is one of the main obstacles to the treatment of lung cancer patients with EGFR inhibitors. Here, to further understand the mechanism of EGFR inhibitors in lung cancer and offer novel therapeutic targets for anti-EGFR-inhibitor resistance via the deep mining of pharmacogenomics data, we associated DNA methylation with drug sensitivities for uncovering the methylation sites related to EGFR inhibitor sensitivity genes. Specifically, we first introduced a grouped regularized regression model (Group Least Absolute Shrinkage and Selection Operator, group lasso) to detect the genes that were closely related to EGFR inhibitor effectiveness. Then, we applied the classical regression model (lasso) to identify the methylation sites associated with the above drug sensitivity genes. The new model was validated on the well-known cancer genomics resource: CTRP. GeneHancer and Encyclopedia of DNA Elements (ENCODE) database searches indicated that the predicted methylation sites related to EGFR inhibitor sensitivity genes were related to regulatory elements. Moreover, the correlation analysis on sensitivity genes and predicted methylation sites suggested that the methylation sites located in the promoter region were more correlated with the expression of EGFR inhibitor sensitivity genes than those located in the enhancer region and the TFBS. Meanwhile, we performed differential expression analysis of genes and predicted methylation sites and found that changes in the methylation level of some sites may affect the expression of the corresponding EGFR inhibitor-responsive genes. Therefore, we supposed that the effectiveness of EGFR inhibitors in lung cancer may be improved by methylation modification in their sensitivity genes.
Collapse
|
41
|
Elmes K, Schmich F, Szczurek E, Jenkins J, Beerenwinkel N, Gavryushkin A. Learning epistatic gene interactions from perturbation screens. PLoS One 2021; 16:e0254491. [PMID: 34255784 PMCID: PMC8277066 DOI: 10.1371/journal.pone.0254491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 06/28/2021] [Indexed: 11/21/2022] Open
Abstract
The treatment of complex diseases often relies on combinatorial therapy, a strategy where drugs are used to target multiple genes simultaneously. Promising candidate genes for combinatorial perturbation often constitute epistatic genes, i.e., genes which contribute to a phenotype in a non-linear fashion. Experimental identification of the full landscape of genetic interactions by perturbing all gene combinations is prohibitive due to the exponential growth of testable hypotheses. Here we present a model for the inference of pairwise epistatic, including synthetic lethal, gene interactions from siRNA-based perturbation screens. The model exploits the combinatorial nature of siRNA-based screens resulting from the high numbers of sequence-dependent off-target effects, where each siRNA apart from its intended target knocks down hundreds of additional genes. We show that conditional and marginal epistasis can be estimated as interaction coefficients of regression models on perturbation data. We compare two methods, namely glinternet and xyz, for selecting non-zero effects in high dimensions as components of the model, and make recommendations for the appropriate use of each. For data simulated from real RNAi screening libraries, we show that glinternet successfully identifies epistatic gene pairs with high accuracy across a wide range of relevant parameters for the signal-to-noise ratio of observed phenotypes, the effect size of epistasis and the number of observations per double knockdown. xyz is also able to identify interactions from lower dimensional data sets (fewer genes), but is less accurate for many dimensions. Higher accuracy of glinternet, however, comes at the cost of longer running time compared to xyz. The general model is widely applicable and allows mining the wealth of publicly available RNAi screening data for the estimation of epistatic interactions between genes. As a proof of concept, we apply the model to search for interactions, and potential targets for treatment, among previously published sets of siRNA perturbation screens on various pathogens. The identified interactions include both known epistatic interactions as well as novel findings.
Collapse
Affiliation(s)
- Kieran Elmes
- Department of Computer Science, University of Otago, Dunedin, New Zealand
| | - Fabian Schmich
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Ewa Szczurek
- Institute of Informatics, University of Warsaw, Warsaw, Poland
| | - Jeremy Jenkins
- Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, United States of America
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (NB); (AG)
| | - Alex Gavryushkin
- Department of Computer Science, University of Otago, Dunedin, New Zealand
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
- * E-mail: (NB); (AG)
| |
Collapse
|
42
|
Li N, Peng X, Kawaguchi E, Suchard MA, Li G. A scalable surrogate L0 sparse regression method for generalized linear models with applications to large scale data. J Stat Plan Inference 2021. [DOI: 10.1016/j.jspi.2020.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
43
|
Zhu Y, Simpkin AJ, Suderman MJ, Lussier AA, Walton E, Dunn EC, Smith ADAC. A Structured Approach to Evaluating Life-Course Hypotheses: Moving Beyond Analyses of Exposed Versus Unexposed in the -Omics Context. Am J Epidemiol 2021; 190:1101-1112. [PMID: 33125040 DOI: 10.1093/aje/kwaa246] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 10/27/2020] [Accepted: 10/28/2020] [Indexed: 12/12/2022] Open
Abstract
The structured life-course modeling approach (SLCMA) is a theory-driven analytical method that empirically compares multiple prespecified life-course hypotheses characterizing time-dependent exposure-outcome relationships to determine which theory best fits the observed data. In this study, we performed simulations and empirical analyses to evaluate the performance of the SLCMA when applied to genomewide DNA methylation (DNAm). Using simulations (n = 700), we compared 5 statistical inference tests used with SLCMA, assessing the familywise error rate, statistical power, and confidence interval coverage to determine whether inference based on these tests was valid in the presence of substantial multiple testing and small effects-2 hallmark challenges of inference from -omics data. In the empirical analyses (n = 703), we evaluated the time-dependent relationship between childhood abuse and genomewide DNAm. In simulations, selective inference and the max-|t|-test performed best: Both controlled the familywise error rate and yielded moderate statistical power. Empirical analyses using SLCMA revealed time-dependent effects of childhood abuse on DNAm. Our findings show that SLCMA, applied and interpreted appropriately, can be used in high-throughput settings to examine time-dependent effects underlying exposure-outcome relationships over the life course. We provide recommendations for applying the SLCMA in -omics settings and encourage researchers to move beyond analyses of exposed versus unexposed individuals.
Collapse
|
44
|
Kapelner A, Bleich J, Levine A, Cohen ZD, DeRubeis RJ, Berk R. Evaluating the Effectiveness of Personalized Medicine With Software. Front Big Data 2021; 4:572532. [PMID: 34085036 PMCID: PMC8167073 DOI: 10.3389/fdata.2021.572532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 02/03/2021] [Indexed: 11/13/2022] Open
Abstract
We present methodological advances in understanding the effectiveness of personalized medicine models and supply easy-to-use open-source software. Personalized medicine involves the systematic use of individual patient characteristics to determine which treatment option is most likely to result in a better average outcome for the patient. Why is personalized medicine not done more in practice? One of many reasons is because practitioners do not have any easy way to holistically evaluate whether their personalization procedure does better than the standard of care, termed improvement. Our software, "Personalized Treatment Evaluator" (the R package PTE), provides inference for improvement out-of-sample in many clinical scenarios. We also extend current methodology by allowing evaluation of improvement in the case where the endpoint is binary or survival. In the software, the practitioner inputs 1) data from a single-stage randomized trial with one continuous, incidence or survival endpoint and 2) an educated guess of a functional form of a model for the endpoint constructed from domain knowledge. The bootstrap is then employed on data unseen during model fitting to provide confidence intervals for the improvement for the average future patient (assuming future patients are similar to the patients in the trial). One may also test against a null scenario where the hypothesized personalization are not more useful than a standard of care. We demonstrate our method's promise on simulated data as well as on data from a randomized comparative trial investigating two treatments for depression.
Collapse
Affiliation(s)
- Adam Kapelner
- Department of Mathematics, Queens College, CUNY, Queens, NY, United States
| | - Justin Bleich
- Department of Statistics, The Wharton School of the University of Pennsylvania, Philadelphia, PA, United States
| | - Alina Levine
- Department of Mathematics, Queens College, CUNY, Queens, NY, United States
| | - Zachary D. Cohen
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States
| | - Robert J. DeRubeis
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States
| | - Richard Berk
- Department of Statistics, The Wharton School of the University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
45
|
Tredennick AT, Hooker G, Ellner SP, Adler PB. A practical guide to selecting models for exploration, inference, and prediction in ecology. Ecology 2021; 102:e03336. [PMID: 33710619 PMCID: PMC8187274 DOI: 10.1002/ecy.3336] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 10/08/2020] [Accepted: 12/06/2020] [Indexed: 11/12/2022]
Abstract
Selecting among competing statistical models is a core challenge in science. However, the many possible approaches and techniques for model selection, and the conflicting recommendations for their use, can be confusing. We contend that much confusion surrounding statistical model selection results from failing to first clearly specify the purpose of the analysis. We argue that there are three distinct goals for statistical modeling in ecology: data exploration, inference, and prediction. Once the modeling goal is clearly articulated, an appropriate model selection procedure is easier to identify. We review model selection approaches and highlight their strengths and weaknesses relative to each of the three modeling goals. We then present examples of modeling for exploration, inference, and prediction using a time series of butterfly population counts. These show how a model selection approach flows naturally from the modeling goal, leading to different models selected for different purposes, even with exactly the same data set. This review illustrates best practices for ecologists and should serve as a reminder that statistical recipes cannot substitute for critical thinking or for the use of independent data to test hypotheses and validate predictions.
Collapse
Affiliation(s)
- Andrew T Tredennick
- Western EcoSystems Technology, Inc., 1610 East Reynolds Street, Laramie, Wyoming, 82072, USA
| | - Giles Hooker
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, 14853, USA
| | - Stephen P Ellner
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, 14853, USA
| | - Peter B Adler
- Department of Wildland Resources and the Ecology Center, Utah State University, 5230 Old Main Hill, Logan, Utah, 84322, USA
| |
Collapse
|
46
|
Lemhadri I, Ruan F, Tibshirani R. LassoNet: Neural Networks with Feature Sparsity. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2021; 130:10-18. [PMID: 36092461 PMCID: PMC9453696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Much work has been done recently to make neural networks more interpretable, and one approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or ℓ 1-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach achieves feature sparsity by allowing a feature to participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. In experiments with real and simulated data, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.
Collapse
|
47
|
Bobashev G, Warren L, Wu LT. Predictive model of multiple emergency department visits among adults: analysis of the data from the National Survey of Drug Use and Health (NSDUH). BMC Health Serv Res 2021; 21:280. [PMID: 33766009 PMCID: PMC7995604 DOI: 10.1186/s12913-021-06221-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/28/2021] [Indexed: 11/22/2022] Open
Abstract
Background In this methodological paper, we use a novel, predictive approach to examine how demographics, substance use, mental and other health indicators predict multiple visits (≥3) to emergency departments (ED) within a year. Methods State-of-the-art predictive methods were used to evaluate predictive ability and factors predicting multiple visits to ED within a year and to identify factors that influenced the strength of the prediction. The analysis used public-use datasets from the 2015–2018 National Surveys on Drug Use and Health (NSDUH), which used the same questionnaire on the variables of interest. Analysis focused on adults aged ≥18 years. Several predictive models (regressions, trees, and random forests) were validated and compared on independent datasets. Results Predictive ability on a test set for multiple ED visits (≥3 times within a year) measured as the area under the receiver operating characteristic (ROC) reached 0.8, which is good for a national survey. Models revealed consistency in predictive factors across the 4 survey years. The most influential variables for predicting ≥3 ED visits per year were fair/poor self-rated health, being nervous or restless/fidgety, having a lower income, asthma, heart condition/disease, having chronic obstructive pulmonary disease (COPD), nicotine dependence, African-American race, female sex, having diabetes, and being of younger age (18–20). Conclusions The findings reveal the need to address behavioral and mental health contributors to ED visits and reinforce the importance of developing integrated care models in primary care settings to improve mental health for medically vulnerable patients. The presented modeling approach can be broadly applied to national and other large surveys. Supplementary Information The online version contains supplementary material available at 10.1186/s12913-021-06221-w.
Collapse
Affiliation(s)
- Georgiy Bobashev
- RTI International, 3040 Cornwallis Rd., P.O. Box 12194, Research Triangle Park, NC, 27709, USA.
| | - Lauren Warren
- RTI International, 3040 Cornwallis Rd., P.O. Box 12194, Research Triangle Park, NC, 27709, USA
| | - Li-Tzy Wu
- Department of Psychiatry and Behavioral Sciences and Department of Medicine, Duke University School of Medicine, Box 3903, Durham, NC, 27710, USA.
| |
Collapse
|
48
|
Williams DR. Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons. MULTIVARIATE BEHAVIORAL RESEARCH 2021; 56:336-352. [PMID: 33739907 DOI: 10.1080/00273171.2021.1894412] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Gaussian graphical models (GGM; "networks") allow for estimating conditional dependence structures that are encoded by partial correlations. This is accomplished by identifying non-zero relations in the inverse of the covariance matrix. In psychology the default estimation method uses ℓ1-regularization, where the accompanying inferences are restricted to frequentist objectives. Bayesian methods remain relatively uncommon in practice and methodological literatures. To date, they have not yet been used for estimation and inference in the psychological network literature. In this work, I introduce Bayesian methodology that is specifically designed for the most common psychological applications. The graphical structure is determined with posterior probabilities that can be used to assess conditional dependent and independent relations. Additional methods are provided for extending inference to specific aspects within- and between-networks, including partial correlation differences and Bayesian methodology to quantify network predictability. I first demonstrate that the decision rule based on posterior probabilities can be calibrated to the desired level of specificity. The proposed techniques are then demonstrated in several illustrative examples. The methods have been implemented in the R package BGGM.
Collapse
Affiliation(s)
- Donald R Williams
- Department of Psychology, University of California, Davis, Davis, California, USA
| |
Collapse
|
49
|
Emad A, Sinha S. Inference of phenotype-relevant transcriptional regulatory networks elucidates cancer type-specific regulatory mechanisms in a pan-cancer study. NPJ Syst Biol Appl 2021; 7:9. [PMID: 33558504 PMCID: PMC7870953 DOI: 10.1038/s41540-021-00169-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 01/05/2021] [Indexed: 01/30/2023] Open
Abstract
Reconstruction of transcriptional regulatory networks (TRNs) is a powerful approach to unravel the gene expression programs involved in healthy and disease states of a cell. However, these networks are usually reconstructed independent of the phenotypic (or clinical) properties of the samples. Therefore, they may confound regulatory mechanisms that are specifically related to a phenotypic property with more general mechanisms underlying the full complement of the analyzed samples. In this study, we develop a method called InPheRNo to identify "phenotype-relevant" TRNs. This method is based on a probabilistic graphical model that models the simultaneous effects of multiple transcription factors (TFs) on their target genes and the statistical relationship between the target genes' expression and the phenotype. Extensive comparison of InPheRNo with related approaches using primary tumor samples of 18 cancer types from The Cancer Genome Atlas reveals that InPheRNo can accurately reconstruct cancer type-relevant TRNs and identify cancer driver TFs. In addition, survival analysis reveals that the activity level of TFs with many target genes could distinguish patients with poor prognosis from those with better prognosis.
Collapse
Affiliation(s)
- Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
50
|
Jin R, Tan A. Fast Markov Chain Monte Carlo for High-Dimensional Bayesian Regression Models With Shrinkage Priors. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2020.1864383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Rui Jin
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA
| | - Aixin Tan
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA
| |
Collapse
|