1
|
Svensson T, Svensson AK, Kitlinski M, Engström G, Nilsson J, Orho-Melander M, Nilsson PM, Melander O. Very short sleep duration reveals a proteomic fingerprint that is selectively associated with incident diabetes mellitus but not with incident coronary heart disease: a cohort study. BMC Med 2024; 22:173. [PMID: 38649900 PMCID: PMC11035142 DOI: 10.1186/s12916-024-03392-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/15/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND The molecular pathways linking short and long sleep duration with incident diabetes mellitus (iDM) and incident coronary heart disease (iCHD) are not known. We aimed to identify circulating protein patterns associated with sleep duration and test their impact on incident cardiometabolic disease. METHODS We assessed sleep duration and measured 78 plasma proteins among 3336 participants aged 46-68 years, free from DM and CHD at baseline, and identified cases of iDM and iCHD using national registers. Incident events occurring in the first 3 years of follow-up were excluded from analyses. Tenfold cross-fit partialing-out lasso logistic regression adjusted for age and sex was used to identify proteins that significantly predicted sleep duration quintiles when compared with the referent quintile 3 (Q3). Predictive proteins were weighted and combined into proteomic scores (PS) for sleep duration Q1, Q2, Q4, and Q5. Combinations of PS were included in a linear regression model to identify the best predictors of habitual sleep duration. Cox proportional hazards regression models with sleep duration quintiles and sleep-predictive PS as the main exposures were related to iDM and iCHD after adjustment for known covariates. RESULTS Sixteen unique proteomic markers, predominantly reflecting inflammation and apoptosis, predicted sleep duration quintiles. The combination of PSQ1 and PSQ5 best predicted sleep duration. Mean follow-up times for iDM (n = 522) and iCHD (n = 411) were 21.8 and 22.4 years, respectively. Compared with sleep duration Q3, all sleep duration quintiles were positively and significantly associated with iDM. Only sleep duration Q1 was positively and significantly associated with iCHD. Inclusion of PSQ1 and PSQ5 abrogated the association between sleep duration Q1 and iDM. Moreover, PSQ1 was significantly associated with iDM (HR = 1.27, 95% CI: 1.06-1.53). PSQ1 and PSQ5 were not associated with iCHD and did not markedly attenuate the association between sleep duration Q1 with iCHD. CONCLUSIONS We here identify plasma proteomic fingerprints of sleep duration and suggest that PSQ1 could explain the association between very short sleep duration and incident DM.
Collapse
Affiliation(s)
- Thomas Svensson
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden.
- Precision Health, Department of Bioengineering, Graduate School of Engineering, the University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan.
- Graduate School of Health Innovation, Kanagawa University of Human Services, Kawasaki-Ku, Kawasaki-Shi, Kanagawa, Japan.
| | - Akiko Kishi Svensson
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
- Precision Health, Department of Bioengineering, Graduate School of Engineering, the University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
- Department of Diabetes and Metabolic Diseases, the University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | | | - Gunnar Engström
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
| | - Jan Nilsson
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
| | - Marju Orho-Melander
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
| | - Peter M Nilsson
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
| | - Olle Melander
- Department of Clinical Sciences, Lund University, Skåne University Hospital, CRC, Jan Waldenströms Gata 35, 20502, Malmö, Sweden
- Department of Internal Medicine, Skåne University Hospital, Malmö, Sweden
| |
Collapse
|
2
|
Wang S, Li W, Zeng N, Xu J, Yang Y, Deng X, Chen Z, Duan W, Liu Y, Guo Y, Chen R, Kang Y. Acute exacerbation prediction of COPD based on Auto-metric graph neural network with inspiratory and expiratory chest CT images. Heliyon 2024; 10:e28724. [PMID: 38601695 PMCID: PMC11004525 DOI: 10.1016/j.heliyon.2024.e28724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/16/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is a widely prevalent disease with significant mortality and disability rates and has become the third leading cause of death globally. Patients with acute exacerbation of COPD (AECOPD) often substantially suffer deterioration and death. Therefore, COPD patients deserve special consideration regarding treatment in this fragile population for pre-clinical health management. Based on the above, this paper proposes an AECOPD prediction model based on the Auto-Metric Graph Neural Network (AMGNN) using inspiratory and expiratory chest low-dose CT images. This study was approved by the ethics committee in the First Affiliated Hospital of Guangzhou Medical University. Subsequently, 202 COPD patients with inspiratory and expiratory chest CT Images and their annual number of AECOPD were collected after the exclusion. First, the inspiratory and expiratory lung parenchyma images of the 202 COPD patients are extracted using a trained ResU-Net. Then, inspiratory and expiratory lung Radiomics and CNN features are extracted from the 202 inspiratory and expiratory lung parenchyma images by Pyradiomics and pre-trained Med3D (a heterogeneous 3D network), respectively. Last, Radiomics and CNN features are combined and then further selected by the Lasso algorithm and generalized linear model for determining node features and risk factors of AMGNN, and then the AECOPD prediction model is established. Compared to related models, the proposed model performs best, achieving an accuracy of 0.944, precision of 0.950, F1-score of 0.944, ad area under the curve of 0.965. Therefore, it is concluded that our model may become an effective tool for AECOPD prediction.
Collapse
Affiliation(s)
- Shicong Wang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen 518060, China
| | - Wei Li
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
| | - Nanrong Zeng
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen 518060, China
| | - Jiaxuan Xu
- The First Affiliated Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, Guangzhou 510120, China
| | - Yingjian Yang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Xingguang Deng
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Ziran Chen
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
| | - Wenxin Duan
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen 518060, China
| | - Yang Liu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
| | - Yingwei Guo
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
| | - Rongchang Chen
- The First Affiliated Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The National Center for Respiratory Medicine, Guangzhou 510120, China
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Shenzhen Institute of Respiratory Diseases, Shenzhen 518001, China
| | - Yan Kang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
- School of Applied Technology, Shenzhen University, Shenzhen 518060, China
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
- Engineering Research Centre of Medical Imaging and Intelligent Analysis, Ministry of Education, Shenyang 110169, China
| |
Collapse
|
3
|
Sawant PA, Hiralkar SS, Hulsurkar YP, Phutane MS, Mahajan US, Kudale AM. Predicting over-the-counter antibiotic use in rural Pune, India, using machine learning methods. Epidemiol Health 2024:e2024044. [PMID: 38637971 DOI: 10.4178/epih.e2024044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/25/2024] [Indexed: 04/20/2024] Open
Abstract
Objectives Over-the-counter (OTC) antibiotic use can cause antibiotic resistance, threatening global public health gains. To counter OTC use, this study used machine learning (ML) methods to identify predictors of OTC antibiotic use in rural Pune, India. Methods The features of OTC antibiotic use were selected using stepwise logistic, lasso, random forest, XGBoost, and Boruta algorithms. Regression and tree-based models with all confirmed and tentatively important features were built to predict the use of OTC antibiotics. Five-fold cross-validation was used to tune the models' hyperparameters. The final model was selected based on the highest area under the curve (AUROC) with a 95% confidence interval and the lowest log-loss. Results In rural Pune, the prevalence of OTC antibiotic use was 35.9% (95% CI, 31.56%-40.46%). The perception that buying medicines directly from a medicine shop/pharmacy is useful, using antibiotics for eye-related complaints, more household members consuming antibiotics, and longer duration and higher doses of antibiotic consumption in rural blocks and other social groups were confirmed as important features by the Boruta algorithm. The final model was the XGBoost+Boruta model with 7 predictors (AUROC=0.934; 95% CI, 0.8906-0.9782; log-loss=0.2793) log-loss. Conclusion XGBoost+Boruta, with 7 predictors, was the most accurate model for predicting OTC antibiotic use in rural Pune. Using OTC antibiotics for eye-related complaints, higher consumption of antibiotics and the perception that buying antibiotics directly from a medicine shop/pharmacy is useful were identified as key factors for planning interventions to improve awareness about proper antibiotic use.
Collapse
Affiliation(s)
- Pravin Arun Sawant
- School of Health Sciences, Savitribai Phule Pune University, Pune, Maharashtra, India, Pune, India
| | - Sakshi Shantanu Hiralkar
- School of Health Sciences, Savitribai Phule Pune University, Pune, Maharashtra, India, Pune, India
| | | | - Mugdha Sharad Phutane
- School of Health Sciences, Savitribai Phule Pune University, Pune, Maharashtra, India, Pune, India
| | - Uma Satish Mahajan
- School of Health Sciences, Savitribai Phule Pune University, Pune, Maharashtra, India, Pune, India
| | - Abhay Machindra Kudale
- School of Health Sciences, Savitribai Phule Pune University, Pune, Maharashtra, India, Pune, India
| |
Collapse
|
4
|
Dai B, Breheny P. Cross-validation approaches for penalized Cox regression. Stat Methods Med Res 2024; 33:702-715. [PMID: 38445300 DOI: 10.1177/09622802241233770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Cross-validation is the most common way of selecting tuning parameters in penalized regression, but its use in penalized Cox regression models has received relatively little attention in the literature. Due to its partial likelihood construction, carrying out cross-validation for Cox models is not straightforward, and there are several potential approaches for implementation. Here, we propose a new approach based on cross-validating the linear predictors of the Cox model and compare it to approaches that have been proposed elsewhere. We show that the proposed approach offers an attractive balance of performance and numerical stability, and illustrate these advantages using simulated data as well as analyzing a high-dimensional study of gene expression and survival in lung cancer patients.
Collapse
Affiliation(s)
- Biyue Dai
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN, USA
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa Iowa City, IA, USA
| |
Collapse
|
5
|
Wyss R, van der Laan M, Gruber S, Shi X, Lee H, Dutcher SK, Nelson JC, Toh S, Russo M, Wang SV, Desai RJ, Lin KJ. Targeted Learning with an Undersmoothed Lasso Propensity Score Model for Large-Scale Covariate Adjustment in Healthcare Database Studies. Am J Epidemiol 2024:kwae023. [PMID: 38517025 DOI: 10.1093/aje/kwae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/13/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024] Open
Abstract
Lasso regression is widely used for large-scale propensity score (PS) estimation in healthcare database studies. In these settings, previous work has shown that undersmoothing (overfitting) Lasso PS models can improve confounding control, but it can also cause problems of non-overlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale Lasso PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed Lasso PS-models, the use of cross-fitting was important for avoiding non-overlap in covariate distributions and reducing bias in causal estimates.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Mark van der Laan
- Division of Biostatistics, University of California, Berkeley, CA, United States
| | - Susan Gruber
- Putnam Data Sciences, Cambridge, MA, United States
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Hana Lee
- Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, United States
| | - Sarah K Dutcher
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United States
| | - Jennifer C Nelson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, United States
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States
| | - Massimiliano Russo
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Shirley V Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
6
|
Chen Q, Zhou T, Zhang C, Zhong X. Exploring relevant factors of cognitive impairment in the elderly Chinese population using Lasso regression and Bayesian networks. Heliyon 2024; 10:e27069. [PMID: 38449590 PMCID: PMC10915566 DOI: 10.1016/j.heliyon.2024.e27069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/12/2024] [Accepted: 02/23/2024] [Indexed: 03/08/2024] Open
Abstract
Older adults are highly susceptible to developing cognitive impairment(CI). Various factors contribute to the prevalence of CI, but the potential relationships among these factors remain unclear. This study aims to explore the relevant factors associated with CI in Chinese older adults and analyze the potential relationships between CI and these factors.We analyzed the data on 6886 older adults aged≥60 from the China Health and Retirement Longitudinal Study (CHARLS) 2018. Lasso regression was initially used to screening variables. Bayesian Networks(BNs) were used to identify the correlates of CI and potential associations between factors. After screening with Lasso regression, 11 variables were finally included in the BNs. The BNs, by establishing a complex network relationship, revealed that age, education, and indoor air pollution were the direct correlates affecting the occurrence of CI in older adults. It also indicated that marital status indirectly influenced CI through age, and residence indirectly linked to CI through two pathways: indoor air pollution and education.Our findings underscore the effectiveness of BNs in unveiling the intricate network linkages among CI and its associated factors, holding promising applications. It can serve as a reference for public health departments to address the prevention of CI in the elderly.
Collapse
Affiliation(s)
- Qiao Chen
- College of Public Health, Chongqing Medical University, Chongqing, 400016, China
- Research Center for Medicine and Social Development, Chongqing Medical University, Chongqing, China
| | - Tianyi Zhou
- College of Public Health, Chongqing Medical University, Chongqing, 400016, China
| | - Cong Zhang
- College of Public Health, Chongqing Medical University, Chongqing, 400016, China
| | - Xiaoni Zhong
- College of Public Health, Chongqing Medical University, Chongqing, 400016, China
| |
Collapse
|
7
|
Allwright M, Guennewig B, Hoffmann AE, Rohleder C, Jieu B, Chung LH, Jiang YC, Lemos Wimmer BF, Qi Y, Don AS, Leweke FM, Couttas TA. ReTimeML: a retention time predictor that supports the LC-MS/MS analysis of sphingolipids. Sci Rep 2024; 14:4375. [PMID: 38388524 PMCID: PMC10883992 DOI: 10.1038/s41598-024-53860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
The analysis of ceramide (Cer) and sphingomyelin (SM) lipid species using liquid chromatography-tandem mass spectrometry (LC-MS/MS) continues to present challenges as their precursor mass and fragmentation can correspond to multiple molecular arrangements. To address this constraint, we developed ReTimeML, a freeware that automates the expected retention times (RTs) for Cer and SM lipid profiles from complex chromatograms. ReTimeML works on the principle that LC-MS/MS experiments have pre-determined RTs from internal standards, calibrators or quality controls used throughout the analysis. Employed as reference RTs, ReTimeML subsequently extrapolates the RTs of unknowns using its machine-learned regression library of mass-to-charge (m/z) versus RT profiles, which does not require model retraining for adaptability on different LC-MS/MS pipelines. We validated ReTimeML RT estimations for various Cer and SM structures across different biologicals, tissues and LC-MS/MS setups, exhibiting a mean variance between 0.23 and 2.43% compared to user annotations. ReTimeML also aided the disambiguation of SM identities from isobar distributions in paired serum-cerebrospinal fluid from healthy volunteers, allowing us to identify a series of non-canonical SMs associated between the two biofluids comprised of a polyunsaturated structure that confers increased stability against catabolic clearance.
Collapse
Affiliation(s)
- Michael Allwright
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Boris Guennewig
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Anna E Hoffmann
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
| | - Cathrin Rohleder
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Beverly Jieu
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Long H Chung
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Yingxin C Jiang
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Bruno F Lemos Wimmer
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yanfei Qi
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Anthony S Don
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - F Markus Leweke
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Timothy A Couttas
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
8
|
Hanke M, Dijkstra L, Foraita R, Didelez V. Variable selection in linear regression models: Choosing the best subset is not always the best choice. Biom J 2024; 66:e2200209. [PMID: 37643390 DOI: 10.1002/bimj.202200209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 06/19/2023] [Accepted: 06/22/2023] [Indexed: 08/31/2023]
Abstract
We consider the question of variable selection in linear regressions, in the sense of identifying the correct direct predictors (those variables that have nonzero coefficients given all candidate predictors). Best subset selection (BSS) is often considered the "gold standard," with its use being restricted only by its NP-hard nature. Alternatives such as the least absolute shrinkage and selection operator (Lasso) or the Elastic net (Enet) have become methods of choice in high-dimensional settings. A recent proposal represents BSS as a mixed-integer optimization problem so that large problems have become computationally feasible. We present an extensive neutral comparison assessing the ability to select the correct direct predictors of BSS compared to forward stepwise selection (FSS), Lasso, and Enet. The simulation considers a range of settings that are challenging regarding dimensionality (number of observations and variables), signal-to-noise ratios, and correlations between predictors. As fair measure of performance, we primarily used the best possible F1-score for each method, and results were confirmed by alternative performance measures and practical criteria for choosing the tuning parameters and subset sizes. Surprisingly, it was only in settings where the signal-to-noise ratio was high and the variables were uncorrelated that BSS reliably outperformed the other methods, even in low-dimensional settings. Furthermore, FSS performed almost identically to BSS. Our results shed new light on the usual presumption of BSS being, in principle, the best choice for selecting the correct direct predictors. Especially for correlated variables, alternatives like Enet are faster and appear to perform better in practical settings.
Collapse
Affiliation(s)
- Moritz Hanke
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Louis Dijkstra
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Ronja Foraita
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Vanessa Didelez
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
- Department of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
9
|
Wang J, Xu Y, Liu L, Wu W, Shen C, Huang H, Zhen Z, Meng J, Li C, Qu Z, He Q, Tian Y. Comparison of LASSO and random forest models for predicting the risk of premature coronary artery disease. BMC Med Inform Decis Mak 2023; 23:297. [PMID: 38124036 PMCID: PMC10734117 DOI: 10.1186/s12911-023-02407-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 12/14/2023] [Indexed: 12/23/2023] Open
Abstract
PURPOSE With the change of lifestyle, the occurrence of coronary artery disease presents a younger trend, increasing the medical and economic burden on the family and society. To reduce the burden caused by this disease, this study applied LASSO Logistic Regression and Random Forest to establish a risk prediction model for premature coronary artery disease(PCAD) separately and compared the predictive performance of the two models. METHODS The data are obtained from 1004 patients with coronary artery disease admitted to a third-class hospital in Liaoning Province from September 2019 to December 2021. The data from 797 patients were ultimately evaluated. The dataset of 797 patients was randomly divided into the training set (569 persons) and the validation set (228 persons) scale by 7:3. The risk prediction model was established and compared by LASSO Logistic and Random Forest. RESULT The two models in this study showed that hyperuricemia, chronic renal disease, carotid artery atherosclerosis were important predictors of premature coronary artery disease. A result of the AUC between the two models showed statistical difference (Z = 3.47, P < 0.05). CONCLUSIONS Random Forest has better prediction performance for PCAD and is suitable for clinical practice. It can provide an objective reference for the early screening and diagnosis of premature coronary artery disease, guide clinical decision-making and promote disease prevention.
Collapse
Affiliation(s)
- Jiayu Wang
- School of Nursing, Liaoning University of Traditional Chinese Medicine, 110847, Shenyang, China
| | - Yikang Xu
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Shenyang Medical College, 110002, Shenyang, China.
| | - Lei Liu
- School of Nursing, Liaoning University of Traditional Chinese Medicine, 110847, Shenyang, China
| | - Wei Wu
- Institute of Humanities and Social Sciences, Shenyang University, 110044, Shenyang, China
| | - Chunjian Shen
- Department of Cardiac Surgery, The Second Affiliated Hospital of Shenyang Medical College, 110002, Shenyang, China
| | - Henan Huang
- Library, Shenyang Medical College, 110034, Shenyang, China
| | - Ziyi Zhen
- School of Public Health, Shenyang medical college, 110034, Shenyang, China
| | - Jixian Meng
- School of nursing, Liaoning Jinqiu Hospital, 110034, Shenyang, China
| | - Chunjing Li
- School of nursing, The First Affiliated Hospital of China Medical University, 110034, Shenyang, China
| | - Zhixin Qu
- School of nursing, Shenyang medical college, 110034, Shenyang, China
| | - Qinglei He
- School of Nursing, Liaoning University of Traditional Chinese Medicine, 110847, Shenyang, China
| | - Yu Tian
- School of Nursing, Liaoning University of Traditional Chinese Medicine, 110847, Shenyang, China
| |
Collapse
|
10
|
Tanigawa Y, Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 2023; 110:1888-1902. [PMID: 37890495 PMCID: PMC10645553 DOI: 10.1016/j.ajhg.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/29/2023] Open
Abstract
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
11
|
John M, Lencz T. Potential application of elastic nets for shared polygenicity detection with adapted threshold selection. Int J Biostat 2023; 19:417-438. [PMID: 36327464 PMCID: PMC10154439 DOI: 10.1515/ijb-2020-0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 10/05/2022] [Indexed: 11/06/2022]
Abstract
Current research suggests that hundreds to thousands of single nucleotide polymorphisms (SNPs) with small to modest effect sizes contribute to the genetic basis of many disorders, a phenomenon labeled as polygenicity. Additionally, many such disorders demonstrate polygenic overlap, in which risk alleles are shared at associated genetic loci. A simple strategy to detect polygenic overlap between two phenotypes is based on rank-ordering the univariate p-values from two genome-wide association studies (GWASs). Although high-dimensional variable selection strategies such as Lasso and elastic nets have been utilized in other GWAS analysis settings, they are yet to be utilized for detecting shared polygenicity. In this paper, we illustrate how elastic nets, with polygenic scores as the dependent variable and with appropriate adaptation in selecting the penalty parameter, may be utilized for detecting a subset of SNPs involved in shared polygenicity. We provide theory to better understand our approaches, and illustrate their utility using synthetic datasets. Results from extensive simulations are presented comparing the elastic net approaches with the rank ordering approach, in various scenarios. Results from simulations studies exhibit one of the elastic net approaches to be superior when the correlations among the SNPs are high. Finally, we apply the methods on two real datasets to illustrate further the capabilities, limitations and differences among the methods.
Collapse
Affiliation(s)
- Majnu John
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Mathematics, Hofstra University, Hempstead, NY
| | - Todd Lencz
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY
| |
Collapse
|
12
|
Park S, Lee ER, Hong HG. Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study. Stat Med 2023; 42:3903-3918. [PMID: 37365909 DOI: 10.1002/sim.9839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 06/04/2023] [Accepted: 06/18/2023] [Indexed: 06/28/2023]
Abstract
Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.
Collapse
Affiliation(s)
- Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, Republic of Korea
| | - Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, Republic of Korea
| | - Hyokyoung G Hong
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, NCI/NIH, Bethesda, Maryland, USA
| |
Collapse
|
13
|
orwa J, Oduor P, Okelloh D, Gethi D, Agaya J, Okumu A, Wandiga S. Comparison of logistic regression with regularized machine learning methods for the prediction of tuberculosis disease in people living with HIV: cross-sectional hospital-based study in Kisumu County, Kenya. Res Sq 2023:rs.3.rs-3354948. [PMID: 37790564 PMCID: PMC10543507 DOI: 10.21203/rs.3.rs-3354948/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Background Tuberculosis (TB) is a major public health concern, particularly among people living with the Human immunodeficiency Virus (PLWH). Accurate prediction of TB disease in this population is crucial for early diagnosis and effective treatment. Logistic regression and regularized machine learning methods have been used to predict TB, but their comparative performance in HIV patients remains unclear. The study aims to compare the predictive performance of logistic regression with that of regularized machine learning methods for TB disease in HIV patients. Methods Retrospective analysis of data from HIV patients diagnosed with TB in three hospitals in Kisumu County (JOOTRH, Kisumu sub-county hospital, Lumumba health center) between [dates]. Logistic regression, Lasso, Ridge, Elastic net regression were used to develop predictive models for TB disease. Model performance was evaluated using accuracy, and area under the receiver operating characteristic curve (AUC-ROC). Results Of the 927 PLWH included in the study, 107 (12.6%) were diagnosed with TB. Being in WHO disease stage III/IV (aOR: 7.13; 95%CI: 3.86-13.33) and having a cough in the last 4 weeks (aOR: 2.34;95%CI: 1.43-3.89) were significant associated with the TB. Logistic regression achieved accuracy of 0.868, and AUC-ROC of 0.744. Elastic net regression also showed good predictive performance with accuracy, and AUC-ROC values of 0.874 and 0.762, respectively. Conclusions Our results suggest that logistic regression, Lasso, Ridge regression, and Elastic net can all be effective methods for predicting TB disease in HIV patients. These findings may have important implications for the development of accurate and reliable models for TB prediction in HIV patients.
Collapse
Affiliation(s)
- James orwa
- Department of Population Health, Aga Khan University, Nairobi, Kenya
| | - Patience Oduor
- Institute of Global Health Equity Research, University of Global Health Equity, Kigali, Rwanda
| | - Douglas Okelloh
- Center for Global Health Research, Kenya Medical Research Institute, Kisumu, Kenya
| | - Dickson Gethi
- Center for Global Health Research, Kenya Medical Research Institute, Kisumu, Kenya
| | - Janet Agaya
- Center for Global Health Research, Kenya Medical Research Institute, Kisumu, Kenya
| | - Albert Okumu
- Center for Global Health Research, Kenya Medical Research Institute, Kisumu, Kenya
| | - Steve Wandiga
- Center for Global Health Research, Kenya Medical Research Institute, Kisumu, Kenya
| |
Collapse
|
14
|
Laufer B, Docherty PD, Murray R, Krueger-Ziolek S, Jalal NA, Hoeflinger F, Rupitsch SJ, Reindl L, Moeller K. Sensor Selection for Tidal Volume Determination via Linear Regression-Impact of Lasso versus Ridge Regression. Sensors (Basel) 2023; 23:7407. [PMID: 37687863 PMCID: PMC10490437 DOI: 10.3390/s23177407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023]
Abstract
The measurement of respiratory volume based on upper body movements by means of a smart shirt is increasingly requested in medical applications. This research used upper body surface motions obtained by a motion capture system, and two regression methods to determine the optimal selection and placement of sensors on a smart shirt to recover respiratory parameters from benchmark spirometry values. The results of the two regression methods (Ridge regression and the least absolute shrinkage and selection operator (Lasso)) were compared. This work shows that the Lasso method offers advantages compared to the Ridge regression, as it provides sparse solutions and is more robust to outliers. However, both methods can be used in this application since they lead to a similar sensor subset with lower computational demand (from exponential effort for full exhaustive search down to the order of O (n2)). A smart shirt for respiratory volume estimation could replace spirometry in some cases and would allow for a more convenient measurement of respiratory parameters in home care or hospital settings.
Collapse
Affiliation(s)
- Bernhard Laufer
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Paul D. Docherty
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
| | - Rua Murray
- School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Sabine Krueger-Ziolek
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Nour Aldeen Jalal
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04109 Leipzig, Germany
| | - Fabian Hoeflinger
- Department of Microsystems Engineering, University of Freiburg, 79085 Freiburg, Germany
| | - Stefan J. Rupitsch
- Department of Microsystems Engineering, University of Freiburg, 79085 Freiburg, Germany
| | - Leonhard Reindl
- Department of Microsystems Engineering, University of Freiburg, 79085 Freiburg, Germany
| | - Knut Moeller
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
- Department of Microsystems Engineering, University of Freiburg, 79085 Freiburg, Germany
| |
Collapse
|
15
|
Raubitzek S, Mallinger K. On the Applicability of Quantum Machine Learning. Entropy (Basel) 2023; 25:992. [PMID: 37509939 PMCID: PMC10377777 DOI: 10.3390/e25070992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 06/22/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
In this article, we investigate the applicability of quantum machine learning for classification tasks using two quantum classifiers from the Qiskit Python environment: the variational quantum circuit and the quantum kernel estimator (QKE). We provide a first evaluation on the performance of these classifiers when using a hyperparameter search on six widely known and publicly available benchmark datasets and analyze how their performance varies with the number of samples on two artificially generated test classification datasets. As quantum machine learning is based on unitary transformations, this paper explores data structures and application fields that could be particularly suitable for quantum advantages. Hereby, this paper introduces a novel dataset based on concepts from quantum mechanics using the exponential map of a Lie algebra. This dataset will be made publicly available and contributes a novel contribution to the empirical evaluation of quantum supremacy. We further compared the performance of VQC and QKE on six widely applicable datasets to contextualize our results. Our results demonstrate that the VQC and QKE perform better than basic machine learning algorithms, such as advanced linear regression models (Ridge and Lasso). They do not match the accuracy and runtime performance of sophisticated modern boosting classifiers such as XGBoost, LightGBM, or CatBoost. Therefore, we conclude that while quantum machine learning algorithms have the potential to surpass classical machine learning methods in the future, especially when physical quantum infrastructure becomes widely available, they currently lag behind classical approaches. Our investigations also show that classical machine learning approaches have superior performance classifying datasets based on group structures, compared to quantum approaches that particularly use unitary processes. Furthermore, our findings highlight the significant impact of different quantum simulators, feature maps, and quantum circuits on the performance of the employed quantum estimators. This observation emphasizes the need for researchers to provide detailed explanations of their hyperparameter choices for quantum machine learning algorithms, as this aspect is currently overlooked in many studies within the field. To facilitate further research in this area and ensure the transparency of our study, we have made the complete code available in a linked GitHub repository.
Collapse
Affiliation(s)
- Sebastian Raubitzek
- Data Science Research Unit, TU Wien, Favoritenstrasse 9-11/194, 1040 Vienna, Austria
- SBA Research gGmbH, Floragasse 7/5.OG, 1040 Vienna, Austria
| | - Kevin Mallinger
- Data Science Research Unit, TU Wien, Favoritenstrasse 9-11/194, 1040 Vienna, Austria
- SBA Research gGmbH, Floragasse 7/5.OG, 1040 Vienna, Austria
| |
Collapse
|
16
|
Hou Y, Zhang A, Lv R, Zhang Y, Ma J, Li T. Machine learning algorithm inversion experiment and pollution analysis of water quality parameters in urban small and medium-sized rivers based on UAV multispectral data. Environ Sci Pollut Res Int 2023:10.1007/s11356-023-27963-6. [PMID: 37278900 DOI: 10.1007/s11356-023-27963-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 05/24/2023] [Indexed: 06/07/2023]
Abstract
To examine and analyze the applicability of UAV multispectral images to urban river monitoring, this paper, taking the Fuyang River in the urban area of Handan Municipality as the object, the orthogonal image data of the river in different seasons were acquired by unmanned aerial vehicles (UAVs) equipped with multispectral sensors, and at the same time, the water samples were collected for physical and chemical indexes detection. Based on the image data, a total of 51 modeling spectral indexes were obtained by constructing three forms of band combinations ranging from the difference index (DI), ratio index (RI), and normalization index (NDI) and combining six single-band spectral values. Through the partial least squares (PLS), random forest (RF), and lasso prediction models, six fitting models of water quality parameters were constructed: turbidity (Turb), suspended, substance (SS), chemical oxygen demand (COD), ammonia nitrogen (NH4-N), total nitrogen (TN), and total phosphorus (TP). After verifying the results and evaluating the accuracy, the following conclusions were drawn: (1) The inversion accuracy of the three types of models is generally the same-summer is better than spring, and winter is the worst. (2) Water quality parameter inversion model based on two kinds of machine learning algorithms has more prominent advantages than PLS. RF model has good performance in the inversion accuracy and generalization ability of water quality parameters in different seasons. (3) The prediction accuracy and stability of the model are positively correlated to a certain extent with the size of the standard deviation of sample values. To sum up, by using the multispectral image data acquired by UAV and adopting the prediction models built upon machine learning algorithms, water quality parameters in different seasons can be predicted in different degrees.
Collapse
Affiliation(s)
- Yikai Hou
- School of Water Resources and Electric Power, Hebei University of Engineering, Handan, China
- Hebei Water Ecological Civilization and Social Governance Research Center, Handan, China
| | | | - Rulan Lv
- Hebei Branch of Construction and Administration Bureau of South-to-North Water Diversion Middle Route Project, Handan, China
| | - Yanping Zhang
- School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan, China
| | - Jie Ma
- School of Water Resources and Electric Power, Hebei University of Engineering, Handan, China
| | - Ting Li
- Educational Technology Center, Hebei University of Engineering, Handan, China
| |
Collapse
|
17
|
Pellikka P, Luotamo M, Sädekoski N, Hietanen J, Vuorinne I, Räsänen M, Heiskanen J, Siljander M, Karhu K, Klami A. Tropical altitudinal gradient soil organic carbon and nitrogen estimation using Specim IQ portable imaging spectrometer. Sci Total Environ 2023; 883:163677. [PMID: 37105488 DOI: 10.1016/j.scitotenv.2023.163677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/25/2023] [Accepted: 04/19/2023] [Indexed: 05/03/2023]
Abstract
The largest actively cycling terrestrial carbon pool, soil, has been disturbed during latest centuries by human actions through reduction of woody land cover. Soil organic carbon (SOC) content can reliably be estimated in laboratory conditions, but more cost-efficient and mobile techniques are needed for large-scale monitoring of SOC e.g. in remote areas. We demonstrate the capability of a mobile hyperspectral camera operating in the visible-near infrared wavelength range for practical estimation of soil organic carbon (SOC) and nitrogen content, to support efficient monitoring of soil properties. The 191 soil samples were collected in Taita Taveta County, Kenya representing an altitudinal gradient comprising five typical land use types: agroforestry, cropland, forest, shrubland and sisal estate. The soil samples were imaged using a Specim IQ hyperspectral camera under controlled laboratory conditions, and their carbon and nitrogen content was determined with a combustion analyzer. We use machine learning for estimating SOC and N content based on the spectral images, studying also automatic selection of informative wavelengths and quantification of prediction uncertainty. Five alternative methods were all found to perform well with a cross-validated R2 of approximately 0.8 and an RMSE of one percentage point, demonstrating feasibility of the proposed imaging setup and computational pipeline.
Collapse
Affiliation(s)
- Petri Pellikka
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland; State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, PR China
| | - Markku Luotamo
- University of Helsinki, Department of Computer Science, Helsinki, Finland.
| | - Niklas Sädekoski
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Jesse Hietanen
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Ilja Vuorinne
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Matti Räsänen
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Janne Heiskanen
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Mika Siljander
- University of Helsinki, Department of Geosciences and Geography, Helsinki, Finland
| | - Kristiina Karhu
- University of Helsinki, Department of Forest Sciences, Helsinki, Finland; Helsinki Institute of Life Science (HiLIFE), Helsinki, Finland
| | - Arto Klami
- University of Helsinki, Department of Computer Science, Helsinki, Finland
| |
Collapse
|
18
|
Belhechmi S, Le Teuff G, De Bin R, Rotolo F, Michiels S. Favoring the hierarchical constraint in penalized survival models for randomized trials in precision medicine. BMC Bioinformatics 2023; 24:96. [PMID: 36927444 PMCID: PMC10022294 DOI: 10.1186/s12859-023-05162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 01/27/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the standard penalized statistical approaches. We aimed to find a compromise for high-dimensional data between the need for sparse model selection and the need for the hierarchical constraint. RESULTS To favor the property of the hierarchical interaction constraint, we proposed to create groups composed of the biomarker main effect and its interaction with treatment and to perform the bi-level selection on these groups. We proposed two weighting approaches (Single Wald (SW) and likelihood ratio test (LRT)) for the adaptive lasso method. The selection performance of these two approaches is compared to alternative lasso extensions (adaptive lasso with ridge-based weights, composite Minimax Concave Penalty, group exponential lasso and Sparse Group Lasso) through a simulation study. A RCT (NSABP B-31) randomizing 1574 patients (431 events) with early breast cancer aiming to evaluate the effect of adjuvant trastuzumab on distant-recurrence free survival with expression data from 462 genes measured in the tumour will serve for illustration. The simulation study illustrates that the adaptive lasso LRT and SW, and the group exponential lasso favored the hierarchical interaction constraint. Overall, in the alternative scenarios, they had the best balance of false discovery and false negative rates for the main effects of the selected interactions. For NSABP B-31, 12 gene-treatment interactions were identified more than 20% by the different methods. Among them, the adaptive lasso (SW) approach offered the best trade-off between a high number of selected gene-treatment interactions and a high proportion of selection of both the gene-treatment interaction and its main effect. CONCLUSIONS Adaptive lasso with Single Wald and likelihood ratio test weighting and the group exponential lasso approaches outperformed their competitors in favoring the hierarchical constraint of the biomarker-treatment interaction. However, the performance of the methods tends to decrease in the presence of prognostic biomarkers.
Collapse
Affiliation(s)
- Shaima Belhechmi
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | - Gwénaël Le Teuff
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France.,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France
| | | | - Federico Rotolo
- Biostatistics and Data Management Unit, Innate Pharma, Marseille, France
| | - Stefan Michiels
- Université Paris-Saclay, CESP, INSERM U1018 Oncostat, labeled Ligue Contre le Cancer, Villejuif, France. .,Bureau de Biostatistique et d'Epidémiologie, Gustave Roussy, Villejuif, France.
| |
Collapse
|
19
|
Xiao Z, Xingjie S, Yiming L, Xu L, Ma S. A General Framework for Identifying Hierarchical Interactions and Its Application to Genomics Data. J Comput Graph Stat 2023; 32:873-883. [PMID: 38009111 PMCID: PMC10671243 DOI: 10.1080/10618600.2022.2152034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 11/08/2022] [Indexed: 12/03/2022]
Abstract
The analysis of hierarchical interactions has long been a challenging problem due to the large number of candidate main effects and interaction effects, and the need for accommodating the "main effects, interactions" hierarchy. The two-stage analysis methods enjoy simplicity and low computational cost, but contradict the fact that the outcome of interest is attributable to the joint effects of multiple main factors and their interactions. The existing joint analysis methods can accurately describe the underlying data generating process, but suffer from prohibitively high computational cost. And it is not straightforward to extend their optimization algorithms to general loss functions. To address this need, we develop a new computational method that is much faster than the existing joint analysis methods and rivals the runtimes of two-stage analysis. The proposed method, HierFabs, adopts the framework of the forward and backward stagewise algorithm and enjoys computational efficiency and broad applicability. To accommodate hierarchy without imposing additional constraints, it has newly developed forward and backward steps. It naturally accommodates the strong and weak hierarchy, and makes optimization much simpler and faster than in the existing studies. Optimality of HierFabs sequences is investigated theoretically. Simulations show that it outperforms the existing methods. The analysis of TCGA data on melanoma demonstrates its competitive practical performance.
Collapse
Affiliation(s)
- Zhang Xiao
- KLATASDS-MOE, Academy of Statistics and Interdisciplinary Sciences, East China Normal University, China
| | - Shi Xingjie
- KLATASDS-MOE, Academy of Statistics and Interdisciplinary Sciences, East China Normal University, China
| | - Liu Yiming
- School of Statistics and Management, Shanghai University of Finance and Economics, China
| | - Liu Xu
- School of Statistics and Management, Shanghai University of Finance and Economics, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, United States
| |
Collapse
|
20
|
Luo S, Zhang W, Mao R, Huang X, Liu F, Liao Q, Sun D, Chen H, Zhang J, Tian F. Establishment and verification of a nomogram model for predicting the risk of post-stroke depression. PeerJ 2023; 11:e14822. [PMID: 36751635 PMCID: PMC9899426 DOI: 10.7717/peerj.14822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 01/06/2023] [Indexed: 02/05/2023] Open
Abstract
Objective The purpose of this study was to establish a nomogram predictive model of clinical risk factors for post-stroke depression (PSD). Patients and Methods We used the data of 202 stroke patients collected from Xuanwu Hospital from October 2018 to September 2020 as training data to develop a predictive model. Nineteen clinical factors were selected to evaluate their risk. Minimum absolute contraction and selection operator (LASSO, least absolute shrinkage and selection operator) regression were used to select the best patient attributes, and seven predictive factors with predictive ability were selected, and then multi-factor logistic regression analysis was carried out to determine six predictive factors and establish a nomogram prediction model. The C-index, calibration chart, and decision curve analyses were used to evaluate the predictive ability, accuracy, and clinical practicability of the prediction model. We then used the data of 156 stroke patients collected by Xiangya Hospital from June 2019 to September 2020 for external verification. Results The selected predictors including work style, number of children, time from onset to hospitalization, history of hyperlipidemia, stroke area, and the National Institutes of Health Stroke Scale (NIHSS) score. The model showed good prediction ability and a C index of 0.773 (95% confidence interval: [0.696-0.850]). It reached a high C-index value of 0.71 in bootstrap verification, and its C index was observed to be as high as 0.702 (95% confidence interval: [0.616-0.788]) in external verification. Decision curve analyses further showed that the nomogram of post-stroke depression has high clinical usefulness when the threshold probability was 6%. Conclusion This novel nomogram, which combines patients' work style, number of children, time from onset to hospitalization, history of hyperlipidemia, stroke area, and NIHSS score, can help clinicians to assess the risk of depression in patients with acute stroke much earlier in the timeline of the disease, and to implement early intervention treatment so as to reduce the incidence of PSD.
Collapse
Affiliation(s)
- Shihang Luo
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Wenrui Zhang
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Rui Mao
- Xiangya Hospital, Central South University, Changsha, China
| | - Xia Huang
- The First People’s Hospital of Huaihua, Hunan, Huaihua, China
| | - Fan Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Qiao Liao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Dongren Sun
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Hengshu Chen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Jingyuan Zhang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Fafa Tian
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, China,Department of National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
21
|
Tang Q, Pan D, Xu C, Chen L. Identification of molecular subtypes based on chromatin regulator and tumor microenvironment infiltration characterization in papillary renal cell carcinoma. J Cancer Res Clin Oncol 2023; 149:231-45. [PMID: 36404389 DOI: 10.1007/s00432-022-04482-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 11/14/2022] [Indexed: 11/21/2022]
Abstract
BACKGROUND Papillary renal cell carcinoma (pRCC) is the second most common histological type of renal cell carcinoma. The prognosis of local pRCC is better than that of ccRCC, but the situation has changed greatly after pRCC metastasis. Chromatin regulators (CRs) are indispensable in epigenetic regulation, and their abnormal expression in tumors leads to the occurrence and development of tumor. However, the role of CRs in pRCC has not been studied yet. MATERIALS AND METHODS 291 samples were obtained from TCGA-KIPR cohort. Unsupervised clustering analysis was utilized to divide the patients of pRCC into two subtypes. Lasso Cox regression analysis was performed to construct a CRs_score model for predicting OS. The unique characteristics of different molecular subtypes were determined by TME cell infiltration analysis, GO and KEGG analysis and drug sensitivity analysis. We also carried out drug sensitivity experiments in vitro to verify the effect of signature genes on drug sensitivity to sunitinib. RESULTS We described the transcriptional and genetic alteration of 19 prognosis-related CRs genes in 291 cases of TCGA-KIRP cohort. We identified two distinct molecular subtypes, which have significant differences in prognosis, clinicopathological features and tumor immune microenvironment (TME). Then, four signature genes were selected by lasso regression analysis to construct a CRs_score for predicting OS, and its predictive ability for patients with pRCC was verified. A nomogram was established to improve the clinical applicability of CRs_score. We found that there was a significant difference in the proportion of immune cell infiltration between high- and low-CRs_score. In addition, CRs_score was significantly correlated with chemosensitivity. Finally, we found that SK-RC-39 cell lines were more sensitive to sunitinib after knocking down the signature gene CDCA3, PDIA4, or SUCNR1. CONCLUSIONS Our comprehensive analysis of CRs gene in pRCC showed that CRs gene plays a potential role in TME, prognosis and drug resistance in pRCC. These findings may lay a foundation for further study of the regulatory role of CRs gene in pRCC, and provide a new method for evaluating prognosis and developing more effective targeted therapy.
Collapse
|
22
|
Abstract
Model-assisted estimators have attracted a lot of attention in the last three decades. These estimators attempt to make an efficient use of auxiliary information available at the estimation stage. A working model linking the survey variable to the auxiliary variables is specified and fitted on the sample data to obtain a set of predictions, which are then incorporated in the estimation procedures. A nice feature of model-assisted procedures is that they maintain important design properties such as consistency and asymptotic unbiasedness irrespective of whether or not the working model is correctly specified. In this article, we examine several model-assisted estimators from a design-based point of view and in a high-dimensional setting, including linear regression and penalized estimators. We conduct an extensive simulation study using data from the Irish Commission for Energy Regulation Smart Metering Project, to assess the performance of several model-assisted estimators in terms of bias and efficiency in this high-dimensional data set.
Collapse
Affiliation(s)
- Mehdi Dagdoug
- Laboratoire de Mathématiques de Besançon, Université de Bourgogne Franche-Comté, Besançon, France
| | - Camelia Goga
- Laboratoire de Mathématiques de Besançon, Université de Bourgogne Franche-Comté, Besançon, France
| | - David Haziza
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
23
|
Yao S, Wang X. Statistical and Machine Learning Methods for Discovering Prognostic Biomarkers for Survival Outcomes. Methods Mol Biol 2023; 2629:11-21. [PMID: 36929071 DOI: 10.1007/978-1-0716-2986-4_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Discovering molecular biomarkers for predicting patient survival outcomes is an essential step toward improving prognosis and therapeutic decision-making in the treatment of severe diseases such as cancer. Due to the high-dimensionality nature of omics datasets, statistical methods such as the least absolute shrinkage and selection operator (Lasso) have been widely applied for cancer biomarker discovery. Due to their scalability and demonstrated prediction performance, machine learning methods such as XGBoost and neural network models have also been gaining popularity in the community recently. However, compared to more traditional survival methods such as Kaplan-Meier and Cox regression methods, high-dimensional methods for survival outcomes are still less well known to biomedical researchers. In this chapter, we will discuss the key analytical procedures in employing these methods for identifying biomarkers associated with survival data. We will also identify important considerations that emerged from the analysis of actual omics data. Some typical instances of misapplication and misinterpretation of machine learning methods will also be discussed. Using lung cancer and head and neck cancer datasets as demonstrations, we provide step-by-step instructions and sample R codes for prioritizing prognostic biomarkers.
Collapse
Affiliation(s)
- Sijie Yao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
24
|
Chen H, Huang L, Jiang X, Wang Y, Bian Y, Ma S, Liu X. Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest. Front Immunol 2022; 13:1025688. [PMID: 36405750 PMCID: PMC9667742 DOI: 10.3389/fimmu.2022.1025688] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/17/2022] [Indexed: 09/25/2023] Open
Abstract
Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole blood samples were collected from the Gene Expression Omnibus (GEO) database. After the datasets were merged, they were divided into training and validation datasets in the ratio of 7:3, where the SLE samples and healthy samples of the training dataset were 334 and 71, respectively, and the SLE samples and healthy samples of the validation dataset were 143 and 30, respectively. The training dataset was used to build the disease risk prediction model, and the validation dataset was used to verify the model identification ability. We first analyzed differentially expressed genes (DEGs) and then used Lasso and random forest (RF) to screen out six key genes (OAS3, USP18, RTP4, SPATS2L, IFI27 and OAS1), which are essential to distinguish SLE from healthy samples. With six key genes incorporated and five iterations of 10-fold cross-validation performed into the RF model, we finally determined the RF model with optimal mtry. The mean values of area under the curve (AUC) and accuracy of the models were over 0.95. The validation dataset was then used to evaluate the AUC performance and our model had an AUC of 0.948. An external validation dataset (GSE99967) with an AUC of 0.810, an accuracy of 0.836, and a sensitivity of 0.921 was used to assess the model's performance. The external validation dataset (GSE185047) of all SLE patients yielded an SLE sensitivity of up to 0.954. The final high-throughput RF model had a mean value of AUC over 0.9, again showing good results. In conclusion, we identified key genetic biomarkers and successfully developed a novel disease risk prediction model for SLE that can be used as a new SLE disease risk prediction aid and contribute to the identification of SLE.
Collapse
Affiliation(s)
- Huajian Chen
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Li Huang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Xinyue Jiang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Yue Wang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Yan Bian
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Shumei Ma
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
| | - Xiaodong Liu
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
- South Zhejiang Institute of Radiation Medicine and Nuclear Technology, Wenzhou Medical University, Wenzhou, China
- Key Laboratory of Watershed Science and Health of Zhejiang Province, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
25
|
Breeur M, Ferrari P, Dossus L, Jenab M, Johansson M, Rinaldi S, Travis RC, His M, Key TJ, Schmidt JA, Overvad K, Tjønneland A, Kyrø C, Rothwell JA, Laouali N, Severi G, Kaaks R, Katzke V, Schulze MB, Eichelmann F, Palli D, Grioni S, Panico S, Tumino R, Sacerdote C, Bueno-de-Mesquita B, Olsen KS, Sandanger TM, Nøst TH, Quirós JR, Bonet C, Barranco MR, Chirlaque MD, Ardanaz E, Sandsveden M, Manjer J, Vidman L, Rentoft M, Muller D, Tsilidis K, Heath AK, Keun H, Adamski J, Keski-Rahkonen P, Scalbert A, Gunter MJ, Viallon V. Pan-cancer analysis of pre-diagnostic blood metabolite concentrations in the European Prospective Investigation into Cancer and Nutrition. BMC Med 2022; 20:351. [PMID: 36258205 PMCID: PMC9580145 DOI: 10.1186/s12916-022-02553-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Epidemiological studies of associations between metabolites and cancer risk have typically focused on specific cancer types separately. Here, we designed a multivariate pan-cancer analysis to identify metabolites potentially associated with multiple cancer types, while also allowing the investigation of cancer type-specific associations. METHODS We analysed targeted metabolomics data available for 5828 matched case-control pairs from cancer-specific case-control studies on breast, colorectal, endometrial, gallbladder, kidney, localized and advanced prostate cancer, and hepatocellular carcinoma nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. From pre-diagnostic blood levels of an initial set of 117 metabolites, 33 cluster representatives of strongly correlated metabolites and 17 single metabolites were derived by hierarchical clustering. The mutually adjusted associations of the resulting 50 metabolites with cancer risk were examined in penalized conditional logistic regression models adjusted for body mass index, using the data-shared lasso penalty. RESULTS Out of the 50 studied metabolites, (i) six were inversely associated with the risk of most cancer types: glutamine, butyrylcarnitine, lysophosphatidylcholine a C18:2, and three clusters of phosphatidylcholines (PCs); (ii) three were positively associated with most cancer types: proline, decanoylcarnitine, and one cluster of PCs; and (iii) 10 were specifically associated with particular cancer types, including histidine that was inversely associated with colorectal cancer risk and one cluster of sphingomyelins that was inversely associated with risk of hepatocellular carcinoma and positively with endometrial cancer risk. CONCLUSIONS These results could provide novel insights for the identification of pathways for cancer development, in particular those shared across different cancer types.
Collapse
Affiliation(s)
- Marie Breeur
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Pietro Ferrari
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Laure Dossus
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Mazda Jenab
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Mattias Johansson
- Genetics Branch, International Agency for Research on Cancer, 69372 CEDEX 08, Lyon, France
| | - Sabina Rinaldi
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Ruth C Travis
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| | - Mathilde His
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Tim J Key
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| | - Julie A Schmidt
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
- Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University Hospital and Aarhus University, DK-8200, Aarhus N, Denmark
| | - Kim Overvad
- Department of Public Health, Aarhus University, DK-8000, Aarhus C, Denmark
| | - Anne Tjønneland
- Danish Cancer Society Research Center Diet, Genes and Environment Nutrition and Biomarkers, DK-2100, Copenhagen, Denmark
| | - Cecilie Kyrø
- Danish Cancer Society Research Center Diet, Genes and Environment Nutrition and Biomarkers, DK-2100, Copenhagen, Denmark
| | - Joseph A Rothwell
- Université Paris-Saclay, UVSQ, Inserm, CESP U1018, "Exposome and Heredity" team, Gustave Roussy, 94800, Villejuif, France
| | - Nasser Laouali
- Université Paris-Saclay, UVSQ, Inserm, CESP U1018, "Exposome and Heredity" team, Gustave Roussy, 94800, Villejuif, France
| | - Gianluca Severi
- Université Paris-Saclay, UVSQ, Inserm, CESP U1018, "Exposome and Heredity" team, Gustave Roussy, 94800, Villejuif, France
| | - Rudolf Kaaks
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Verena Katzke
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Matthias B Schulze
- Department of Molecular Epidemiology, German Institute of Human Nutrition, 14558, Nuthetal, Germany
| | - Fabian Eichelmann
- Department of Molecular Epidemiology, German Institute of Human Nutrition, 14558, Nuthetal, Germany
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
| | - Domenico Palli
- Institute of Cancer Research, Prevention and Clinical Network (ISPRO), 50139, Florence, Italy
| | - Sara Grioni
- Epidemiology and Prevention Unit, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, 20133, Milan, Italy
| | - Salvatore Panico
- Dipartimento di Medicina Clinica e Chirurgia, Federico II University, 80131, Naples, Italy
| | - Rosario Tumino
- Hyblean Association for Epidemiological Research, AIRE-ONLUS, 97100, Ragusa, Italy
| | - Carlotta Sacerdote
- Unit of Cancer Epidemiology Città della Salute e della Scienza University-Hospital, 10126, Turin, Italy
| | - Bas Bueno-de-Mesquita
- Centre for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment (RIVM), PO Box 1, 3720, BA, Bilthoven, The Netherlands
| | - Karina Standahl Olsen
- Department of Community Medicine, UiT The Arctic University of Norway, N-9037, Tromsø, Norway
| | | | - Therese Haugdahl Nøst
- Department of Community Medicine, UiT The Arctic University of Norway, N-9037, Tromsø, Norway
| | - J Ramón Quirós
- Public Health Directorate, 33006, Oviedo, Asturias, Spain
| | - Catalina Bonet
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, 08908, Barcelona, Spain
| | - Miguel Rodríguez Barranco
- Escuela Andaluza de Salud Pública (EASP), 18011, Granada, Spain
- Instituto de Investigación Biosanitaria ibs. GRANADA, 18012, Granada, Spain
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029, Madrid, Spain
| | - María-Dolores Chirlaque
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029, Madrid, Spain
- Department of Epidemiology, Regional Health Council, IMIB-Arrixaca, Murcia University, 30003, Murcia, Spain
| | - Eva Ardanaz
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029, Madrid, Spain
- Navarra Public Health Institute, 31003, Pamplona, Spain
- IdiSNA, Navarra Institute for Health Research, 31008, Pamplona, Spain
| | - Malte Sandsveden
- Department of Clinical Sciences Malmö Lund University, SE-214 28, Malmö, Sweden
| | - Jonas Manjer
- Departement of Surgery, Skåne University Hospital Malmö, Lund University, SE-214 28, Malmö, Sweden
| | - Linda Vidman
- Department of Radiation Sciences, Oncology Umeå University, SE-901 87, Umeå, Sweden
| | - Matilda Rentoft
- Department of Radiation Sciences, Oncology Umeå University, SE-901 87, Umeå, Sweden
| | - David Muller
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, W2 1PG, UK
| | - Kostas Tsilidis
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, W2 1PG, UK
| | - Alicia K Heath
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, W2 1PG, UK
| | - Hector Keun
- Department of Surgery and Cancer, Cancer Metabolism and Systems Toxicology Group, Division of Cancer, Imperial College London, London, SW7 2AZ, UK
| | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764, Neuherberg, Germany
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Singapore
- Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, 1000, Ljubljana, Slovenia
| | - Pekka Keski-Rahkonen
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Augustin Scalbert
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Marc J Gunter
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France
| | - Vivian Viallon
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, NME Branch, 69372 CEDEX 08, Lyon, France.
| |
Collapse
|
26
|
Jardillier R, Koca D, Chatelain F, Guyon L. Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening. BMC Cancer 2022; 22:1045. [PMID: 36199072 PMCID: PMC9533541 DOI: 10.1186/s12885-022-10117-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of patient survival from tumor molecular '-omics' data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of "high dimension", as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction. METHODS In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS). RESULTS First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration. CONCLUSIONS Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.
Collapse
Affiliation(s)
- Rémy Jardillier
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France.,GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Dzenis Koca
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France
| | - Florent Chatelain
- GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Laurent Guyon
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France.
| |
Collapse
|
27
|
Chu B, Qureshi S. Comparing Out-of-Sample Performance of Machine Learning Methods to Forecast U.S. GDP Growth. Comput Econ 2022; 62:1-43. [PMID: 36157276 PMCID: PMC9483293 DOI: 10.1007/s10614-022-10312-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 08/04/2022] [Indexed: 06/16/2023]
Abstract
We run a 'horse race' among popular forecasting methods, including machine learning (ML) and deep learning (DL) methods, that are employed to forecast U.S. GDP growth. Given the unstable nature of GDP growth data, we implement a recursive forecasting strategy to calculate the out-of-sample performance metrics of forecasts for multiple subperiods. We use three sets of predictors: a large set of 224 predictors [of U.S. GDP growth] taken from a large quarterly macroeconomic database (namely, FRED-QD), a small set of nine strong predictors selected from the large set, and another small set including these nine strong predictors together with a high-frequency business condition index. We then obtain the following three main findings: (1) when forecasting with a large number of predictors with mixed predictive power, density-based ML methods (such as bagging, boosting, or neural networks) can somewhat outperform sparsity-based methods (such as Lasso) for short-horizon forecast, but it is not easy to distinguish the performance of these two types of methods for long-horizon forecast; (2) density-based ML methods tend to perform better with a large set of predictors than with a small subset of strong predictors, especially when it comes to shorter horizon forecast; and (3) parsimonious models using a strong high-frequency predictor can outperform other sophisticated ML and DL models using a large number of low-frequency predictors at least for long-horizon forecast, highlighting the important role of predictors in economic forecasting. We also find that ensemble ML methods (which are the special cases of density-based ML methods) can outperform popular DL methods.
Collapse
Affiliation(s)
- Ba Chu
- Department of Economics, Carleton University, 1125 Colonel By Dr., Ottawa, Ontario Canada
| | - Shafiullah Qureshi
- Department of Economics, Carleton University, 1125 Colonel By Dr., Ottawa, Ontario Canada
- Department of Economics, NUML, Islamabad, Pakistan
| |
Collapse
|
28
|
Karmokar J, Islam MA, Uddin M, Hassan MR, Yousuf MSI. An assessment of meteorological parameters effects on COVID-19 pandemic in Bangladesh using machine learning models. Environ Sci Pollut Res Int 2022; 29:67103-67114. [PMID: 35522407 PMCID: PMC9073515 DOI: 10.1007/s11356-022-20196-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/07/2022] [Indexed: 06/14/2023]
Abstract
Coronavirus (COVID-19) is a highly contagious virus (SARS-CoV-2) that has caused a global pandemic since January 2020. Scientists around the world are doing extensive research to control this disease. They are working tirelessly to find out the origin and causes of the disease. Several studies and experiments mentioned that there are some meteorological parameters which are highly correlated with COVID-19 transmission. In this work, we studied the effects of 11 meteorological parameters on the transmission of COVID-19 in Bangladesh. We first applied statistical analysis and observed that there is no significant effect of these parameters. Therefore, we proposed a novel technique to analyze the insight effects of these parameters by using a combination of Random Forest, CART, and Lasso feature selection techniques. We observed that 4 parameters are highly influential for COVID-19 where [Formula: see text] and Cloud have positive association whereas WS and AQ have negative impact. Among them, Cloud has the highest positive impact which is 0.063 and WS has the highest negative association which is [Formula: see text]. Moreover, we have validated our performance using DLNM technique. The result of this investigation can be used to develop an alert system that will assist the policymakers to know the characteristics of COVID-19 against meteorological parameters and can impose different policies based on the weather conditions.
Collapse
Affiliation(s)
- Jaionto Karmokar
- Department of Computer Science and Mathematics, Bangladesh Agricultural University, Mymensingh, 2202 Bangladesh
| | - Mohammad Aminul Islam
- Department of Computer Science and Mathematics, Bangladesh Agricultural University, Mymensingh, 2202 Bangladesh
| | - Machbah Uddin
- Department of Computer Science and Mathematics, Bangladesh Agricultural University, Mymensingh, 2202 Bangladesh
| | - Md. Rakib Hassan
- Department of Computer Science and Mathematics, Bangladesh Agricultural University, Mymensingh, 2202 Bangladesh
| | - Md. Sayeed Iftekhar Yousuf
- Department of Computer Science and Mathematics, Bangladesh Agricultural University, Mymensingh, 2202 Bangladesh
| |
Collapse
|
29
|
Hou Y, Zhang A, Lv R, Zhao S, Ma J, Zhang H, Li Z. A study on water quality parameters estimation for urban rivers based on ground hyperspectral remote sensing technology. Environ Sci Pollut Res Int 2022; 29:63640-63654. [PMID: 35460477 DOI: 10.1007/s11356-022-20293-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 04/12/2022] [Indexed: 06/14/2023]
Abstract
The purpose of this research is to seek a better inversion algorithm. And on this basis, it explores the feasibility of using hyperspectral monitoring technology instead of laboratory physical and chemical index test and evaluates the prediction effect of inversion model on water quality change. So as to be more convenient, more economical and extensive monitoring methods for water quality monitoring of urban internal river are provided. This paper takes the water samples collected in Fuyang River in downtown Handan as the research object and obtains original spectral data of the samples by the ASD FieldSpec 4 field hyperspectral spectrometer. After the smoothing filter pretreatment by the Savitzky-Golay (SG) method and specified mathematical transformations, the modeling spectral indicators of various water quality parameters are selected and determined by calculating the maximum mean of absolute values for correlation coefficients of various spectral indicators and measured values in the wavelength range from 400 to 950 nm. By introducing partial least squares (PLS), random forest (RF), and Lasso (least absolute shrinkage and selection operator), six water quality parameter fitting models were constructed including turbidity (Turb), suspended substance (SS), chemical oxygen demand (COD), NH4-N, total nitrogen (TN), and total phosphorus (TP), which are also testified and evaluated through hyperspectral data. The results show that different spectral transformation methods highlight different information inversion effects. The first derivative of reciprocal logarithm of spectral data after SG smoothing has a good modeling effect on four water quality parameters including Turb, COD, NH4-N, and TP; and the first derivative of smoothed spectral data has a good modeling effect on both water quality parameters of SS and TN. Among the three models, the PLS model has a good prediction effect, with the [Formula: see text] for COD, TN, and TP ranging from 0.74 to 0.80, while that for Turb and SS shows relatively poorer prediction effect, followed by even worse effect on HN4-H. Both machine learning algorithms of RF and Lasso have respectively obtained the best prediction models for different water quality parameters. The Lasso model has a [Formula: see text] value above 0.8 for water body organic pollutants COD, TN, and TP, and the decrease value for [Formula: see text] and [Formula: see text] is below 0.1, which indicates that the model has high prediction accuracy and strong generalization ability, but the results of SS and NH4-N do not meet the expected accuracy. In the inversion model of RF for COD, [Formula: see text] is higher than [Formula: see text], which shows excellent performance, and has certain prediction ability for SS and NH4-N. The RF model and Lasso model complement each other effectively in applicability and prediction accuracy. Compared with the traditional regression model PLS, machine learning has obvious overall advantages, making it more suitable for classified inversion prediction of urban river water quality parameters.
Collapse
Affiliation(s)
- Yikai Hou
- School of Water Resources and Electric Power, Hebei University of Engineering, Handan, China
- Hebei Water Ecological Civilization and Social Governance Research Center, Handan, China
| | | | - Rulan Lv
- Hebei Branch of Construction and Administration Bureau of South-to-North Water Diversion Middle Route Project, Handan, China
| | - Song Zhao
- School of Water Resources and Electric Power, Hebei University of Engineering, Handan, China
- Hebei Branch of Construction and Administration Bureau of South-to-North Water Diversion Middle Route Project, Handan, China
| | - Jie Ma
- School of Water Resources and Electric Power, Hebei University of Engineering, Handan, China
| | - Hai Zhang
- Department of Agriculture Water Conservancy and Hydropower, Handan Bureau of Water Conservancy, Handan, China
| | - Ziang Li
- School of Landscape and Ecological Engineering, Hebei University of Engineering, Handan, China
| |
Collapse
|
30
|
Jiménez S, Angeles-Valdez D, Rodríguez-Delgado A, Fresán A, Miranda E, Alcalá-Lozano R, Duque-Alarcón X, Arango de Montis I, Garza-Villarreal EA. Machine learning detects predictors of symptom severity and impulsivity after dialectical behavior therapy skills training group in borderline personality disorder. J Psychiatr Res 2022; 151:42-49. [PMID: 35447506 DOI: 10.1016/j.jpsychires.2022.03.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 12/08/2021] [Accepted: 03/31/2022] [Indexed: 10/18/2022]
Abstract
Only 50% of the patients with Borderline Personality Disorder (BPD) respond to psychotherapies, such as Dialectical Behavioral Therapy (DBT), this might be increased by identifying baseline predictors of clinical change. We use machine learning to detect clinical features that could predict improvement/worsening for severity and impulsivity of BPD after DBT skills training group. To predict illness severity, we analyzed data from 125 patients with BPD divided into 17 DBT psychotherapy groups, and for impulsiveness we analyzed 89 patients distributed into 12 DBT groups. All patients were evaluated at baseline using widely self-report tests; ∼70% of the sample were randomly selected and two machine learning models (lasso and Random forest [Rf]) were trained using 10-fold cross-validation and compared to predict the post-treatment response. Models' generalization was assessed in ∼30% of the remaining sample. Relevant variables for DBT (i.e. the mindfulness ability "non-judging", or "non-planning" impulsiveness) measured at baseline, were robust predictors of clinical change after six months of weekly DBT sessions. Using 10-fold cross-validation, the Rf model had significantly lower prediction error than lasso for the BPD severity variable, Mean Absolute Error (MAE) lasso - Rf = 1.55 (95% CI, 0.63-2.48) as well as for impulsivity, MAE lasso - Rf = 1.97 (95% CI, 0.57-3.35). According to Rf and the permutations method, 34/613 significant predictors for severity and 17/613 for impulsivity were identified. Using machine learning to identify the most important variables before starting DBT could be fundamental for personalized treatment and disease prognosis.
Collapse
Affiliation(s)
- Said Jiménez
- Facultad de Psicología, Universidad Nacional Autónoma de México, Mexico City, Mexico.
| | - Diego Angeles-Valdez
- Instituto de Neurobiología, Universidad Nacional Autónoma de México Campus Juriquilla, Querétaro, Mexico
| | - Andrés Rodríguez-Delgado
- Clínica de Trastorno Lımite de la Personalidad, Instituto Nacional de Psiquiatría "Ramón de la Fuente Muñiz", Mexico City, Mexico
| | - Ana Fresán
- Subdirección de Investigaciones Clınicas, Instituto Nacional de Psiquiatrıa Ramón de la Fuente Muñız, Mexico City, Mexico
| | - Edgar Miranda
- Clínica de Trastorno Lımite de la Personalidad, Instituto Nacional de Psiquiatría "Ramón de la Fuente Muñiz", Mexico City, Mexico
| | - Ruth Alcalá-Lozano
- Subdirección de Investigaciones Clınicas, Instituto Nacional de Psiquiatrıa Ramón de la Fuente Muñız, Mexico City, Mexico
| | - Xóchitl Duque-Alarcón
- Clınica de Especialidades en Neuropsiquiatrıa, Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE), Mexico City, Mexico
| | - Iván Arango de Montis
- Clínica de Trastorno Lımite de la Personalidad, Instituto Nacional de Psiquiatría "Ramón de la Fuente Muñiz", Mexico City, Mexico
| | - Eduardo A Garza-Villarreal
- Instituto de Neurobiología, Universidad Nacional Autónoma de México Campus Juriquilla, Querétaro, Mexico.
| |
Collapse
|
31
|
Abstract
In this work, we study the transfer learning problem under highdimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its ℓ1 / ℓ2-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and source are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN.
Collapse
Affiliation(s)
- Ye Tian
- Department of Statistics, Columbia University
| | - Yang Feng
- Department of Biostatistics, School of Global Public Health, New York University
| |
Collapse
|
32
|
Kamenetsky ME, Trentham-Dietz A, Newcomb P, Zhu J, Gangnon RE. A Flexible Method for Identifying Spatial Clusters of Breast Cancer Using Individual-Level Data. Ann Epidemiol 2022; 73:9-16. [PMID: 35772615 DOI: 10.1016/j.annepidem.2022.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/16/2022] [Accepted: 06/10/2022] [Indexed: 11/22/2022]
Abstract
Prior research has shown that cancer risk varies by geography, but scan statistics methods for identifying cancer clusters in case-control studies have been limited in their ability to identify multiple clusters and adjust for participant-level risk factors. We develop a method to identify geographic patterns of breast cancer odds using the Wisconsin Women's Health Study, a series of 5 population-based case-control studies of female Wisconsin residents aged 20-79 enrolled in 1988-2004 (cases=16,076, controls=16,795). We create sets of potential clusters by overlaying a 1 km grid over each county-neighborhood and enumerating a series of overlapping circles. Using a two-step approach, we fit a penalized binomial regression model to the number of cases and trials in each grid cell, penalizing all potential clusters by the least absolute shrinkage and selection operator (Lasso). We use BIC to select the number of clusters, which are included in a participant-level logistic regression model. We identify 15 geographic clusters, resulting in 23 areas of unique geographic odds ratios. After adjustment for known risk factors, confidence intervals narrowed but breast cancer odds ratios did not meaningfully change; one additional hotspot was identified. By considering multiple overlapping spatial clusters simultaneously, we discern gradients of spatial odds across Wisconsin.
Collapse
|
33
|
Ivansic D, Palm J, Pantev C, Brüggemann P, Mazurek B, Guntinas-Lichius O, Dobel C. Prediction of treatment outcome in patients suffering from chronic tinnitus - from individual characteristics to early and long-term change. J Psychosom Res 2022; 157:110794. [PMID: 35339906 DOI: 10.1016/j.jpsychores.2022.110794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 01/07/2022] [Accepted: 03/19/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND AND OBJECTIVE Despite the availability of successful treatment approaches for chronic tinnitus, it has proven difficult to predict who profits from treatment and it is still an open question if it is possible at all. We tried to overcome methodological shortcomings and to predict treatment outcome indicated by questionnaires measuring tinnitus distress. METHODS This is an observational, prospective cohort study. Lasso and post-selection inference methods were used to predict treatment outcome in patients suffering from chronic tinnitus (N = 747). Patients were treated for five consecutive days in an interdisciplinary setting according to guidelines. RESULTS Early change, i.e. a positive response after the screening day, as well as change due to treatment was predicted by several psychopathological variables, but also tinnitus-related factors. Female gender as an example was a predictor for change due to treatment. In general, therapy success both for early change and change due to treatment cannot be predicted satisfactorily as indicated by a high mean cross-validation error (for early change: 9.83, for change due to treatment: 14.40). Analyzing sub-groups separated by tinnitus severity to reduce heterogeneity did not improve the situation and for patients with high tinnitus severity no predictors at all could be reported (cross-validated error: 11.62 for the low quartile, 13.38 for the low-medium quartile, and 15.61 for the medium-high quartile). CONCLUSION Several psychopathological and tinnitus-related variables predicted early and long-term change. Nevertheless, also overcoming methodological shortcomings to predict treatment success did not lead to satisfactory results, but rather emphasizes the high heterogeneity of chronic tinnitus.
Collapse
|
34
|
Wang JH, Wang KH, Chen YH. Overlapping group screening for detection of gene-environment interactions with application to TCGA high-dimensional survival genomic data. BMC Bioinformatics 2022; 23:202. [PMID: 35637439 PMCID: PMC9150322 DOI: 10.1186/s12859-022-04750-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the context of biomedical and epidemiological research, gene-environment (G-E) interaction is of great significance to the etiology and progression of many complex diseases. In high-dimensional genetic data, two general models, marginal and joint models, are proposed to identify important interaction factors. Most existing approaches for identifying G-E interactions are limited owing to the lack of robustness to outliers/contamination in response and predictor data. In particular, right-censored survival outcomes make the associated feature screening even challenging. In this article, we utilize the overlapping group screening (OGS) approach to select important G-E interactions related to clinical survival outcomes by incorporating the gene pathway information under a joint modeling framework. RESULTS Simulation studies under various scenarios are carried out to compare the performances of our proposed method with some commonly used methods. In the real data applications, we use our proposed method to identify G-E interactions related to the clinical survival outcomes of patients with head and neck squamous cell carcinoma, and esophageal carcinoma in The Cancer Genome Atlas clinical survival genetic data, and further establish corresponding survival prediction models. Both simulation and real data studies show that our method performs well and outperforms existing methods in the G-E interaction selection, effect estimation, and survival prediction accuracy. CONCLUSIONS The OGS approach is useful for selecting important environmental factors, genes and G-E interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The same idea of the OGS approach can apply to other outcome models, such as the proportional odds survival time model, the logistic regression model for binary outcomes, and the multinomial logistic regression model for multi-class outcomes.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung, 40724, Taiwan.
| | - Kang-Hsin Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung, 40724, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, 11529, Taiwan
| |
Collapse
|
35
|
Li J, Yu G, Li Q, Liu Y. Sample-wise Combined Missing Effect Model with Penalization. J Comput Graph Stat 2022; 32:263-274. [PMID: 37274355 PMCID: PMC10237115 DOI: 10.1080/10618600.2022.2070172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 04/11/2022] [Indexed: 10/18/2022]
Abstract
Modern high-dimensional statistical inference often faces the problem of missing data. In recent decades, many studies have focused on this topic and provided strategies including complete-sample analysis and imputation procedures. However, complete-sample analysis discards information of incomplete samples, while imputation procedures have accumulative errors from each single imputation. In this paper, we propose a new method, Sample-wise COmbined missing effect Model with penalization (SCOM), to deal with missing data occurring in predictors. Instead of imputing the predictors, SCOM estimates the combined effect caused by all missing data for each incomplete sample. SCOM makes full use of all available data. It is robust with respect to various missing mechanisms. Theoretical studies show the oracle inequality for the proposed estimator, and the consistency of variable selection and combined missing effect selection. Simulation studies and an application to the Residential Building Data also illustrate the effectiveness of the proposed SCOM.
Collapse
Affiliation(s)
- Jialu Li
- School of Mathematics and Statistics, Beijing Institute of Technology
| | - Guan Yu
- Department of Biostatistics, State University of New York at Buffalo
| | - Qizhai Li
- LSC, NCMIS, Academy of Mathematics and Systems Science, University of Chinese Academy of Sciences
| | - Yufeng Liu
- Department of Statistics and Operations Research, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill
- Department of Genetics, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill
- Department of Biostatistics, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill
| |
Collapse
|
36
|
Xie T, Zhang N, Mao Y, Zhu B. How to predict the electronic health literacy of Chinese primary and secondary school students?: establishment of a model and web nomograms. BMC Public Health 2022; 22:1048. [PMID: 35614408 DOI: 10.1186/s12889-022-13421-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 05/12/2022] [Indexed: 12/23/2022] Open
Abstract
Background The internet has become an important resource for the public to obtain health information. Therefore, the ability to obtain and use such resources has become important for health literacy. This study aimed to establish a prediction model of Chinese students’ electronic health literacy (EHL) to guide government policymaking and parental interventions, identify the predictors of EHL in Chinese students using random forests, and establish a corresponding prediction model to help policymakers and parents determine whether primary and secondary school students have high EHL. Methods This is a cross-sectional study. From June to August 2021, a cluster sample survey was conducted with 1,300 students from seven primary and secondary schools in Shaanxi Province, China. We evaluated 1,235 primary and secondary school students using the e-health literacy scale. The data were divided into training and testing datasets in a 70:30 ratio for further analysis using random forest. The predictive accuracy of the score was measured using the area under the receiver operating characteristic curve. We also used decision curve analysis to determine the usefulness of the prediction model by quantifying the net benefits at different threshold probabilities in the validation dataset. Results We found that 33.6% of students had high EHL. The univariate analysis showed that age (P < 0.001), grade (P < 0.001), employment status (P < 0.001), household location (P < 0.001), parental phubbing behavior (P < 0.001), and general self-efficacy (P < 0.001) were significantly associated with EHL. A random forest classification model was developed with the training dataset (872 students), and seven variables were confirmed as important: age, grade, employment status, father education level, game time, parental phubbing behavior, and general self-efficacy. The validation of the model showed good discrimination, with an area under the curve of 0.975 in the training dataset and 0.738 in the testing dataset. The model was translated into an online risk calculator, which is freely available (https://xietao.shinyapps.io/DynNomapp/). Conclusions In this study, an intuitive tool to predict the EHL of Chinese primary and secondary school students was developed and validated. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-022-13421-4.
Collapse
|
37
|
Sajal IH, Chowdhury M, Wang T, Euhus D, Choudhary PK, Biswas S. CBCRisk-Black: a personalized contralateral breast cancer risk prediction model for black women. Breast Cancer Res Treat 2022. [PMID: 35562619 DOI: 10.1007/s10549-022-06612-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 04/18/2022] [Indexed: 11/02/2022]
Abstract
PURPOSE Black breast cancer (BC) survivors have a higher risk of developing contralateral breast cancer (CBC) than Whites. Existing CBC risk prediction tools are developed based on mostly White women. To address this racial disparity, it is crucial to develop tools tailored for Black women to help them inform about their actual risk of CBC. METHODS We propose an absolute risk prediction model, CBCRisk-Black, specifically for Black BC patients. It uses data on Black women from two sources: Breast Cancer Surveillance Consortium (BCSC) and Surveillance, Epidemiology, and End Results (SEER). First, a matched lasso logistic regression model for estimating relative risks (RR) is developed. Then, it is combined with relevant hazard rates and attributable risks to obtain absolute risks. Six-fold cross-validation is used to internally validate CBCRisk-Black. We also compare CBCRisk-Black with CBCRisk, an existing CBC risk prediction model. RESULTS The RR model uses data from BCSC on 744 Black women (186 cases). CBCRisk-Black has four risk factors (RR compared to baseline): breast density (2.13 for heterogeneous/extremely dense), family history of BC (2.28 for yes), first BC tumor size (2.14 for T3/T4, 1.56 for TIS), and age at first diagnosis of BC (1.41 for < 40). The area under the receiver operating characteristic curve (AUC) for 3- and 5-year predictions are 0.72 and 0.65 for CBCRisk-Black while those are 0.65 and 0.60 for CBCRisk. CONCLUSION CBCRisk-Black may serve as a useful tool to clinicians in counseling Black BC patients by providing a more accurate and personalized CBC risk estimate.
Collapse
|
38
|
Liang B, Wei R, Zhang J, Li Y, Yang T, Xu S, Zhang K, Xia W, Guo B, Liu B, Zhou F, Wu Q, Dai J. Applying pytorch toolkit to plan optimization for circular cone based robotic radiotherapy. Radiat Oncol 2022; 17:82. [PMID: 35443714 PMCID: PMC9022303 DOI: 10.1186/s13014-022-02045-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Background Robotic linac is ideally suited to deliver hypo-fractionated radiotherapy due to its compact head and flexible positioning. The non-coplanar treatment space improves the delivery versatility but the complexity also leads to prolonged optimization and treatment time. Methods In this study, we attempted to use the deep learning (pytorch) framework for the plan optimization of circular cone based robotic radiotherapy. The optimization problem was topologized into a simple feedforward neural network, thus the treatment plan optimization was transformed into network training. With this transformation, the pytorch toolkit with high-efficiency automatic differentiation (AD) for gradient calculation was used as the optimization solver. To improve the treatment efficiency, plans with fewer nodes and beams were sought. The least absolute shrinkage and selection operator (lasso) and the group lasso were employed to address the “sparsity” issue. Results The AD-S (AD sparse) approach was validated on 6 brain and 6 liver cancer cases and the results were compared with the commercial MultiPlan (MLP) system. It was found that the AD-S plans achieved rapid dose fall-off and satisfactory sparing of organs at risk (OARs). Treatment efficiency was improved by the reduction in the number of nodes (28%) and beams (18%), and monitor unit (MU, 24%), respectively. The computational time was shortened to 47.3 s on average. Conclusions In summary, this first attempt of applying deep learning framework to the robotic radiotherapy plan optimization is promising and has the potential to be used clinically. Supplementary Information The online version contains supplementary material available at 10.1186/s13014-022-02045-y.
Collapse
Affiliation(s)
- Bin Liang
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China
| | - Ran Wei
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China
| | - Jianghu Zhang
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China
| | - Yongbao Li
- Sun Yat-Sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangzhou, 510060, Guangdong, China
| | - Tao Yang
- Department of Radiation Oncology, PLA General Hospital, Beijing, 100853, China
| | - Shouping Xu
- Department of Radiation Oncology, PLA General Hospital, Beijing, 100853, China
| | - Ke Zhang
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China
| | - Wenlong Xia
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China
| | - Bin Guo
- Image Processing Center, Beihang University, Beijing, 100191, China
| | - Bo Liu
- Image Processing Center, Beihang University, Beijing, 100191, China.,Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing, 100083, China
| | - Fugen Zhou
- Image Processing Center, Beihang University, Beijing, 100191, China.,Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing, 100083, China
| | - Qiuwen Wu
- Division of Radiation Physics, Department of Radiation Oncology, Duke University Medical Center, Box 3295, Durham, NC, 27710, USA.
| | - Jianrong Dai
- Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Chaoyang Dist, 17 Panjianyuannanli Rd., Beijing, 100021, China.
| |
Collapse
|
39
|
Buchaillot ML, Soba D, Shu T, Liu J, Aranjuelo I, Araus JL, Runion GB, Prior SA, Kefauver SC, Sanz-Saez A. Estimating peanut and soybean photosynthetic traits using leaf spectral reflectance and advance regression models. Planta 2022; 255:93. [PMID: 35325309 PMCID: PMC8948130 DOI: 10.1007/s00425-022-03867-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
MAIN CONCLUSION By combining hyperspectral signatures of peanut and soybean, we predicted Vcmax and Jmax with 70 and 50% accuracy. The PLS was the model that better predicted these photosynthetic parameters. One proposed key strategy for increasing potential crop stability and yield centers on exploitation of genotypic variability in photosynthetic capacity through precise high-throughput phenotyping techniques. Photosynthetic parameters, such as the maximum rate of Rubisco catalyzed carboxylation (Vc,max) and maximum electron transport rate supporting RuBP regeneration (Jmax), have been identified as key targets for improvement. The primary techniques for measuring these physiological parameters are very time-consuming. However, these parameters could be estimated using rapid and non-destructive leaf spectroscopy techniques. This study compared four different advanced regression models (PLS, BR, ARDR, and LASSO) to estimate Vc,max and Jmax based on leaf reflectance spectra measured with an ASD FieldSpec4. Two leguminous species were tested under different controlled environmental conditions: (1) peanut under different water regimes at normal atmospheric conditions and (2) soybean under high [CO2] and high night temperature. Model sensitivities were assessed for each crop and treatment separately and in combination to identify strengths and weaknesses of each modeling approach. Regardless of regression model, robust predictions were achieved for Vc,max (R2 = 0.70) and Jmax (R2 = 0.50). Field spectroscopy shows promising results for estimating spatial and temporal variations in photosynthetic capacity based on leaf and canopy spectral properties.
Collapse
Affiliation(s)
- Ma Luisa Buchaillot
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona, 08028, Barcelona, Spain
- AGROTECNIO (Center for Research in Agrotechnology), Av. Rovira Roure 191, 25198, Lleida, Spain
| | - David Soba
- Instituto de Agrobiotecnología (IdAB), Consejo Superior de Investigaciones Científicas (CSIC)-Gobierno de Navarra, Av. Pamplona 123, 31192, Mutilva, Spain
| | - Tianchu Shu
- Department of Crop, Soil, and Environmental Sciences, Auburn University, Alabama, USA
| | - Juan Liu
- Industrial Crops Research Institute, Henan Academy of Agricultural Sciences, Henan, China
| | - Iker Aranjuelo
- Instituto de Agrobiotecnología (IdAB), Consejo Superior de Investigaciones Científicas (CSIC)-Gobierno de Navarra, Av. Pamplona 123, 31192, Mutilva, Spain
| | - José Luis Araus
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona, 08028, Barcelona, Spain
- AGROTECNIO (Center for Research in Agrotechnology), Av. Rovira Roure 191, 25198, Lleida, Spain
| | - G Brett Runion
- U.S. Department of Agriculture-Agricultural Research Service, National Soil Dynamics Laboratory, Auburn, AL, 36832, USA
| | - Stephen A Prior
- U.S. Department of Agriculture-Agricultural Research Service, National Soil Dynamics Laboratory, Auburn, AL, 36832, USA
| | - Shawn C Kefauver
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona, 08028, Barcelona, Spain.
- AGROTECNIO (Center for Research in Agrotechnology), Av. Rovira Roure 191, 25198, Lleida, Spain.
| | - Alvaro Sanz-Saez
- Department of Crop, Soil, and Environmental Sciences, Auburn University, Alabama, USA.
| |
Collapse
|
40
|
Zhang J, Fuhrer T, Ye H, Kwan B, Montemayor D, Tumova J, Darshi M, Afshinnia F, Scialla JJ, Anderson A, Porter AC, Taliercio JJ, Rincon-Choles H, Rao P, Xie D, Feldman H, Sauer U, Sharma K, Natarajan L. High-Throughput Metabolomics and Diabetic Kidney Disease Progression: Evidence from the Chronic Renal Insufficiency (CRIC) Study. Am J Nephrol 2022; 53:215-225. [PMID: 35196658 PMCID: PMC9116599 DOI: 10.1159/000521940] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 12/30/2021] [Indexed: 01/14/2023]
Abstract
INTRODUCTION Metabolomics could offer novel prognostic biomarkers and elucidate mechanisms of diabetic kidney disease (DKD) progression. Via metabolomic analysis of urine samples from 995 CRIC participants with diabetes and state-of-the-art statistical modeling, we aimed to identify metabolites prognostic to DKD progression. METHODS Urine samples (N = 995) were assayed for relative metabolite abundance by untargeted flow-injection mass spectrometry, and stringent statistical criteria were used to eliminate noisy compounds, resulting in 698 annotated metabolite ions. Utilizing the 698 metabolites' ion abundance along with clinical data (demographics, blood pressure, HbA1c, eGFR, and albuminuria), we developed univariate and multivariate models for the eGFR slope using penalized (lasso) and random forest models. Final models were tested on time-to-ESKD (end-stage kidney disease) via cross-validated C-statistics. We also conducted pathway enrichment analysis and a targeted analysis of a subset of metabolites. RESULTS Six eGFR slope models selected 9-30 variables. In the adjusted ESKD model with highest C-statistic, valine (or betaine) and 3-(4-methyl-3-pentenyl)thiophene were associated (p < 0.05) with 44% and 65% higher hazard of ESKD per doubling of metabolite abundance, respectively. Also, 13 (of 15) prognostic amino acids, including valine and betaine, were confirmed in the targeted analysis. Enrichment analysis revealed pathways implicated in kidney and cardiometabolic disease. CONCLUSIONS Using the diverse CRIC sample, a high-throughput untargeted assay, followed by targeted analysis, and rigorous statistical analysis to reduce false discovery, we identified several novel metabolites implicated in DKD progression. If replicated in independent cohorts, our findings could inform risk stratification and treatment strategies for patients with DKD.
Collapse
Affiliation(s)
- Jing Zhang
- Moores Cancer Center, University of California, San Diego, California, USA
| | - Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Hongping Ye
- Department of Medicine, Center for Renal Precision Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Brian Kwan
- Moores Cancer Center, University of California, San Diego, California, USA
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, California, USA
| | - Daniel Montemayor
- Department of Medicine, Center for Renal Precision Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Jana Tumova
- Department of Medicine, Center for Renal Precision Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Manjula Darshi
- Department of Medicine, Center for Renal Precision Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Farsad Afshinnia
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Medical School, Ann Arbor, Michigan, USA
| | - Julia J. Scialla
- Departments of Medicine and Public Health Sciences, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - Amanda Anderson
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anna C. Porter
- Jesse Brown VA Medical Center, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Jonathan J. Taliercio
- Cleveland Clinic Foundation, Glickman Urological & Kidney Institute, Department of Nephrology, Cleveland, Ohio, USA
| | - Hernan Rincon-Choles
- Cleveland Clinic Foundation, Glickman Urological & Kidney Institute, Department of Nephrology, Cleveland, Ohio, USA
| | - Panduranga Rao
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Medical School, Ann Arbor, Michigan, USA
| | - Dawei Xie
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Harold Feldman
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Kumar Sharma
- Department of Medicine, Center for Renal Precision Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Loki Natarajan
- Moores Cancer Center, University of California, San Diego, California, USA
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, California, USA
| |
Collapse
|
41
|
Abstract
Background The coronavirus disease 2019 (COVID-19) pandemic has posed a significant influence on public mental health. Current efforts focus on alleviating the impacts of the disease on public health and the economy, with the psychological effects due to COVID-19 relatively ignored. In this research, we are interested in exploring the quantitative characterization of the pandemic impact on public mental health by studying an online survey dataset of the United States. Methods The analyses are conducted based on a large scale of online mental health-related survey study in the United States, conducted over 12 consecutive weeks from April 23, 2020 to July 21, 2020. We are interested in examining the risk factors that have a significant impact on mental health as well as in their estimated effects over time. We employ the multiple imputation by chained equations (MICE) method to deal with missing values and take logistic regression with the least absolute shrinkage and selection operator (Lasso) method to identify risk factors for mental health. Results Our analysis shows that risk predictors for an individual to experience mental health issues include the pandemic situation of the State where the individual resides, age, gender, race, marital status, health conditions, the number of household members, employment status, the level of confidence of the future food affordability, availability of health insurance, mortgage status, and the information of kids enrolling in school. The effects of most of the predictors seem to change over time though the degree varies for different risk factors. The effects of risk factors, such as States and gender show noticeable change over time, whereas the factor age exhibits seemingly unchanged effects over time. Conclusions The analysis results unveil evidence-based findings to identify the groups who are psychologically vulnerable to the COVID-19 pandemic. This study provides helpful evidence for assisting healthcare providers and policymakers to take steps for mitigating the pandemic effects on public mental health, especially in boosting public health care, improving public confidence in future food conditions, and creating more job opportunities. Trial registration This article does not report the results of a health care intervention on human participants. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01411-w.
Collapse
Affiliation(s)
- Jingyu Cui
- Department of Statistical and Actuarial Sciences, Western University, London, Ontario, N6A 5B7, Canada
| | - Jingwei Lu
- Department of Statistical and Actuarial Sciences, Western University, London, Ontario, N6A 5B7, Canada
| | - Yijia Weng
- Department of Statistical and Actuarial Sciences, Western University, London, Ontario, N6A 5B7, Canada
| | - Grace Y Yi
- Department of Statistical and Actuarial Sciences, Western University, London, Ontario, N6A 5B7, Canada. .,Department of Computer Science, Western University, London, Ontario, N6A 5B7, Canada.
| | - Wenqing He
- Department of Statistical and Actuarial Sciences, Western University, London, Ontario, N6A 5B7, Canada
| |
Collapse
|
42
|
Hamidi F, Gilani N, Belaghi RA, Sarbakhsh P, Edgünlü T, Santaguida P. Exploration of Potential miRNA Biomarkers and Prediction for Ovarian Cancer Using Artificial Intelligence. Front Genet 2021; 12:724785. [PMID: 34899827 PMCID: PMC8656459 DOI: 10.3389/fgene.2021.724785] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 10/07/2021] [Indexed: 12/20/2022] Open
Abstract
Ovarian cancer is the second most dangerous gynecologic cancer with a high mortality rate. The classification of gene expression data from high-dimensional and small-sample gene expression data is a challenging task. The discovery of miRNAs, a small non-coding RNA with 18–25 nucleotides in length that regulates gene expression, has revealed the existence of a new array for regulation of genes and has been reported as playing a serious role in cancer. By using LASSO and Elastic Net as embedded algorithms of feature selection techniques, the present study identified 10 miRNAs that were regulated in ovarian serum cancer samples compared to non-cancer samples in public available dataset GSE106817: hsa-miR-5100, hsa-miR-6800-5p, hsa-miR-1233-5p, hsa-miR-4532, hsa-miR-4783-3p, hsa-miR-4787-3p, hsa-miR-1228-5p, hsa-miR-1290, hsa-miR-3184-5p, and hsa-miR-320b. Further, we implemented state-of-the-art machine learning classifiers, such as logistic regression, random forest, artificial neural network, XGBoost, and decision trees to build clinical prediction models. Next, the diagnostic performance of these models with identified miRNAs was evaluated in the internal (GSE106817) and external validation dataset (GSE113486) by ROC analysis. The results showed that first four prediction models consistently yielded an AUC of 100%. Our findings provide significant evidence that the serum miRNA profile represents a promising diagnostic biomarker for ovarian cancer.
Collapse
Affiliation(s)
- Farzaneh Hamidi
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Gilani
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Arabi Belaghi
- Department of Statistics, Faculty of Mathematical Science, University of Tabriz, Tabriz, Iran.,Department of Mathematics, Applied Mathematics and Statistics, Uppsala University, Uppsala, Sweden
| | - Parvin Sarbakhsh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Tuba Edgünlü
- Department of Medical Biology, Faculty of Medicine, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Pasqualina Santaguida
- Department of Health Research and Methods, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
43
|
Zheng E, Zhang J, Wang Q, Qiao H. Continuous Multi-DoF Wrist Kinematics Estimation Based on a Human-Machine Interface With Electrical-Impedance-Tomography. Front Neurorobot 2021; 15:734525. [PMID: 34658831 PMCID: PMC8515921 DOI: 10.3389/fnbot.2021.734525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/16/2021] [Indexed: 11/21/2022] Open
Abstract
This study proposed a multiple degree-of-freedom (DoF) continuous wrist angle estimation approach based on an electrical impedance tomography (EIT) interface. The interface can inspect the spatial information of deep muscles with a soft elastic fabric sensing band, extending the measurement scope of the existing muscle-signal-based sensors. The designed estimation algorithm first extracted the mutual correlation of the EIT regions with a kernel function, and second used a regularization procedure to select the optimal coefficients. We evaluated the method with different features and regression models on 12 healthy subjects when they performed six basic wrist joint motions. The average root-mean-square error of the 3-DoF estimation task was 7.62°, and the average R2 was 0.92. The results are comparable to state-of-the-art with sEMG signals in multi-DoF tasks. Future endeavors will be paid in this new direction to get more promising results.
Collapse
Affiliation(s)
- Enhao Zheng
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Jingzhi Zhang
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,School of General Engineering, Beihang University, Beijing, China
| | - Qining Wang
- Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University, Beijing, China
| | - Hong Qiao
- The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
44
|
Ballout N, Garcia C, Viallon V. Sparse estimation for case-control studies with multiple disease subtypes. Biostatistics 2021; 22:738-755. [PMID: 31977036 DOI: 10.1093/biostatistics/kxz063] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 12/13/2019] [Accepted: 12/16/2019] [Indexed: 11/15/2022] Open
Abstract
The analysis of case-control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case-control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.
Collapse
Affiliation(s)
- Nadim Ballout
- IFSTTAR, TS2, UMRESTTE, Université Claude Bernard Lyon 1, 25, avenue François Mitterrand, Case24, Cité des mobilités, 69675 Bron Cedex, France
| | - Cedric Garcia
- IFSTTAR, AME, DEST, 14-20 Boulevard Newton, Cité Descartes, Champs sur Marne, 77447 Marne la Vallée Cedex 2, France
| | - Vivian Viallon
- Nutritional Methodology and Biostatistics Group, International Agency for Research on Cancer, World Health Organization, 150, Cours Albert Thomas, 69372 Lyon Cedex 08, France
| |
Collapse
|
45
|
Satheeshkumar PS, El-Dallal M, Mohan MP. Feature selection and predicting chemotherapy-induced ulcerative mucositis using machine learning methods. Int J Med Inform 2021; 154:104563. [PMID: 34479094 DOI: 10.1016/j.ijmedinf.2021.104563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Ulcerative mucositis (UM) is a devastating complication of most cancer therapies with less recognized risk factors. Whilst risk predictions are most vital in adverse events, we utilized Machine learning (ML) approaches for predicting chemotherapy-induced UM. METHODS We utilized 2017 National Inpatient Sample database to identify discharges with antineoplastic chemotherapy-induced UM among those received chemotherapy as part of their cancer treatment. We used forward selection and backward elimination for feature selection; lasso and Gradient Boosting Method were used for building our linear and non-linear models. RESULTS In 2017, there were 253 (unweighted numbers) chemotherapy-induced UM patient discharges from 21,626 (unweighted numbers) adult patients who received antineoplastic chemotherapy as part of their cancer treatment. Our linear model, lasso showed performance (C-statistics) AUC: 0.75 (test dataset), 0.75 (training dataset); the Gradient Boosting Method (GBM) model showed AUC: 0.76 in the training and 0.79 in the test datasets. The feature selection derived from stepwise forward selection and backward elimination methods showed variables of importance--antineoplastic chemotherapy-induced pancytopenia, agranulocytosis due to cancer chemotherapy, fluid and electrolyte imbalance, age, anemia due to chemotherapy, median household income, and depression. Higher importance variable derived from GBM in the order of importance were antineoplastic chemotherapy-induced pancytopenia > co-morbidity score > agranulocytosis due to cancer chemotherapy > age > and fluid and electrolyte imbalance. Further, when the analysis was stratified to females only, the ML models performed better than the unstratified model. CONCLUSION Our study showed ML methods performed well in predicting the chemotherapy-induced UM. Predictors identified through ML approach matched to the clinically meaningful and previously discussed predictors of the chemotherapy-induced UM.
Collapse
Affiliation(s)
- Poolakkad S Satheeshkumar
- Harvard Medical School, Boston, MA, USA(1); Department of Oral Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
| | - Mohammed El-Dallal
- Division of Hospital Medicine, Cambridge Health Alliance and Harvard Medical School, Cambridge, MA, USA; Division of Gastroenterology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA
| | - Minu P Mohan
- University of Massachusetts, Lowell, MA 01854, USA.
| |
Collapse
|
46
|
Escribe C, Lu T, Keller-Baruch J, Forgetta V, Xiao B, Richards JB, Bhatnagar S, Oualkacha K, Greenwood CMT. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genet Epidemiol 2021; 45:874-890. [PMID: 34468045 PMCID: PMC9292988 DOI: 10.1002/gepi.22430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/19/2021] [Accepted: 08/12/2021] [Indexed: 11/13/2022]
Abstract
Medical research increasingly includes high‐dimensional regression modeling with a need for error‐in‐variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error‐corrected cross‐validation to enable error‐in‐variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high‐dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross‐validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate‐adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error‐in‐variables adjustments more accessible for high‐dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics‐facilitated personalized medicine research.
Collapse
Affiliation(s)
- Célia Escribe
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Tianyuan Lu
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Quantitative Life Sciences Program, McGill University, Montreal, Québec, Canada
| | - Julyan Keller-Baruch
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada
| | - Vincenzo Forgetta
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada
| | - Bowei Xiao
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Quantitative Life Sciences Program, McGill University, Montreal, Québec, Canada
| | - J Brent Richards
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
| | - Sahir Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Department of Diagnostic Radiology, McGill University, Montreal, Québec, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, Québec, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Gerald Bronfman Department of Oncology, McGill University, Montreal, Québec, Canada
| |
Collapse
|
47
|
Mohr H, Ruge H. Fast Estimation of L1-Regularized Linear Models in the Mass-Univariate Setting. Neuroinformatics 2021; 19:385-92. [PMID: 32935193 DOI: 10.1007/s12021-020-09489-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In certain modeling approaches, activation analyses of task-based fMRI data can involve a relatively large number of predictors. For example, in the encoding model approach, complex stimuli are represented in a high-dimensional feature space, resulting in design matrices with many predictors. Similarly, single-trial models and finite impulse response models may also encompass a large number of predictors. In settings where only few of those predictors are expected to be informative, a sparse model fit can be obtained via L1-regularization. However, estimating L1-regularized models requires an iterative fitting procedure, which considerably increases computation time compared to estimating unregularized or L2-regularized models, and complicates the application of L1-regularization on whole-brain data and large sample sizes. Here we provide several functions for estimating L1-regularized models that are optimized for the mass-univariate analysis approach. The package includes a parallel implementation of the coordinate descent algorithm for CPU-only systems and two implementations of the alternating direction method of multipliers algorithm requiring a GPU device. While the core algorithms are implemented in C++/CUDA, data input/output and parameter settings can be conveniently handled via Matlab. The CPU-based implementation is highly memory-efficient and provides considerable speed-up compared to the standard implementation not optimized for the mass-univariate approach. Further acceleration can be achieved on systems equipped with a CUDA-enabled GPU. Using the fastest GPU-based implementation, computation time for whole-brain estimates can be reduced from 9 h to 5 min in an exemplary data setting. Overall, the provided package facilitates the use of L1-regularization for fMRI activation analyses and enables an efficient employment of L1-regularization on whole-brain data and large sample sizes.
Collapse
|
48
|
Stidham RW, Liu Y, Enchakalody B, Van T, Krishnamurthy V, Su GL, Zhu J, Waljee AK. The Use of Readily Available Longitudinal Data to Predict the Likelihood of Surgery in Crohn Disease. Inflamm Bowel Dis 2021; 27:1328-1334. [PMID: 33769477 PMCID: PMC8314116 DOI: 10.1093/ibd/izab035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND Although imaging, endoscopy, and inflammatory biomarkers are associated with future Crohn disease (CD) outcomes, common laboratory studies may also provide prognostic opportunities. We evaluated machine learning models incorporating routinely collected laboratory studies to predict surgical outcomes in U.S. Veterans with CD. METHODS Adults with CD from a Veterans Health Administration, Veterans Integrated Service Networks (VISN) 10 cohort examined between 2001 and 2015 were used for analysis. Patient demographics, medication use, and longitudinal laboratory values were used to model future surgical outcomes within 1 year. Specifically, data at the time of prediction combined with historical laboratory data characteristics, described as slope, distribution statistics, fluctuation, and linear trend of laboratory values, were considered and principal component analysis transformations were performed to reduce the dimensionality. Lasso regularized logistic regression was used to select features and construct prediction models, with performance assessed by area under the receiver operating characteristic using 10-fold cross-validation. RESULTS We included 4950 observations from 2809 unique patients, among whom 256 had surgery, for modeling. Our optimized model achieved a mean area under the receiver operating characteristic of 0.78 (SD, 0.002). Anti-tumor necrosis factor use was associated with a lower probability of surgery within 1 year and was the most influential predictor in the model, and corticosteroid use was associated with a higher probability of surgery. Among the laboratory variables, high platelet counts, high mean cell hemoglobin concentrations, low albumin levels, and low blood urea nitrogen values were identified as having an elevated influence and association with future surgery. CONCLUSIONS Using machine learning methods that incorporate current and historical data can predict the future risk of CD surgery.
Collapse
Affiliation(s)
- Ryan W Stidham
- Department of Internal Medicine, Division of Gastroenterology and Hepatology, University of Michigan Medical School, Ann Arbor, Michigan, USA
- Michigan Integrated Center for Health Analytics and Medical Prediction, Ann Arbor, Michigan, USA
- Institute for Healthcare Policy and Innovation, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Yumu Liu
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Binu Enchakalody
- Department of Surgery, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Tony Van
- VA Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, USA
| | | | - Grace L Su
- Department of Internal Medicine, Division of Gastroenterology and Hepatology, University of Michigan Medical School, Ann Arbor, Michigan, USA
- VA Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, USA
| | - Ji Zhu
- Institute for Healthcare Policy and Innovation, University of Michigan Medical School, Ann Arbor, Michigan, USA
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Akbar K Waljee
- Department of Internal Medicine, Division of Gastroenterology and Hepatology, University of Michigan Medical School, Ann Arbor, Michigan, USA
- Michigan Integrated Center for Health Analytics and Medical Prediction, Ann Arbor, Michigan, USA
- Institute for Healthcare Policy and Innovation, University of Michigan Medical School, Ann Arbor, Michigan, USA
- VA Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, USA
| |
Collapse
|
49
|
Du Y, Chen H, Varadhan R. Lasso estimation of hierarchical interactions for analyzing heterogeneity of treatment effect. Stat Med 2021; 40:5417-5433. [PMID: 34240443 DOI: 10.1002/sim.9132] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 12/14/2020] [Accepted: 12/30/2020] [Indexed: 11/12/2022]
Abstract
Individuals differ in how they respond to a given treatment. In an effort to predict the treatment response and analyze the heterogeneity of treatment effect, we propose a general modeling framework by identifying treatment-covariate interactions honoring a hierarchical condition. We construct a single-step l 1 norm penalty procedure that maintains the hierarchical structure of interactions in the sense that a treatment-covariate interaction term is included in the model only when either the covariate or both the covariate and treatment have nonzero main effects. We developed a constrained Lasso approach with two parameterization schemes that enforce the hierarchical interaction restriction differently. We solved the resulting constrained optimization problem using a spectral projected gradient method. We compared our methods to the unstructured Lasso using simulation studies including a scenario that violates the hierarchical condition (misspecified model). The simulations showed that our methods yielded more parsimonious models and outperformed the unstructured Lasso for correctly identifying nonzero treatment-covariate interactions. The superior performance of our methods are also corroborated by an application to a large randomized clinical trial data investigating a drug for treating congestive heart failure (N = 2569). Our methods provide a well-suited approach for doing secondary analysis in clinical trials to analyze heterogeneous treatment effects and to identify predictive biomarkers.
Collapse
Affiliation(s)
- Yu Du
- Department of Biometrics, Eli Lilly and Company, Indianapolis, Indiana, USA
| | - Huan Chen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Ravi Varadhan
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Division of Biostatistics and Bioinformatics, Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
50
|
Rafique R, Islam SR, Kazi JU. Machine learning in the prediction of cancer therapy. Comput Struct Biotechnol J 2021; 19:4003-4017. [PMID: 34377366 PMCID: PMC8321893 DOI: 10.1016/j.csbj.2021.07.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 12/15/2022] Open
Abstract
Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.
Collapse
Affiliation(s)
| | - S.M. Riazul Islam
- Department of Computer Science and Engineering, Sejong University, Seoul, South Korea
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Corresponding author at: Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Medicon village Building 404:C3, Scheelevägen 8, 22363 Lund, Sweden.
| |
Collapse
|